


THE ANNALS 
of 
MATHEMATICAL 
STAT ISTICS 


THE ie ‘olen OF THE anil OF 
MATHEMATICAL STATISTICS 


VOLUME XIil 


1942 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHer R. von Mises 
H. CRAMER T. C. Fry EK. 8. PEARSON 
W. E. DemIna H. Hore.iine H. L. Rrerz 

G. Darmolis W. A. SHEWHART 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
should be sent to S. 8. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BALTIMORE, Mb., U.S. A. 








rey 











THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Distribution of the Serial Correlation Coefficient. R.L.ANpERSON. 1 
Serial Correlation and Quadratic Forms in Normal Variables. 


Cs a vin xn n0'etcdceddeeanunhsdunaeuteee 14 
A Generalized Analysis of Variance. FRANKLIN E. Sart- 

ik ac ain 6k KA dah GRO RAMEE REKER SS eRe omens 34 
Distributions in Stratified Sampling. Paunt H. ANDERSON........ 42 


Solution of a Mathematical Problem Connected with the Theory 
of Heredity. S. Bernstein (Translated by Emma LEHMER).. 53 


Some Recent Advances in Mathematical Statistics, I. Burton 


DRL r, tk chances kein 6 Ree ee ae aed data a 62 
Some Recent Advances in Mathematical Statistics, II. C. C. 

Ds itvelinkwwnk yeka'e We navumiatat rere bout eenaed 74 
Notes: 


A Further Remark Concerning the Distribution of the Ratio of the 
Mean Square Successive Difference to the Variance. JoHN VON 


SN 0. cacnun ch tea lumacs dda da heunbade hier aeeniaaaaen 86 
Convexity Properties of Generalized Mean Value Functions. E. F. 
08s cc at cheibtbaenent wend shes ehutelbbanbensiniiieie 88 
A Characterization of the Normal Distribution. Evcens Luxacs... 91 
Note on a Method of Sampling. Caruios E. Digumralt............. 94 
A Sequence of Discrete Variables Exhibiting Correlation Due to 
Common Elements. Caru H. FIsCHER...............0.0.000000- 97 
Report of the New York Meeting of the Institute............... 102 
Report of the Dallas Meeting of the Institute................... 106 
Annual Report of the Secretary-Treasurer of the Institute....... 107 
I ia 5 00 5508 rE Kee odes SeeeaeeGewnd indeed 110 





Vol. XITI, No. 1 — March, 1942 


THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED LY 
S. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FIsHer R. von MIsEs 
H. CRAMER T. C. Fry E. S. PEARSON 
W. E. DemInc H. Hore.iine H. L. Rierz 

G. Darmo!is W. A. SHEWHART 


The ANNALS oF MATHEMATICAL Sratistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Staristics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, 2. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are Mareh, 
June, September and December. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
should be sent to 8. 8. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BaLtTimMorE, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 

















DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 
By R. L. ANDERSON 
North Carolina State College 


1. Introduction. The problem of serial correlation was brought to the atten- 
tion of statisticians by Yule in 1921 [9]. Both Yule and Bartlett [2] have shown 
that the ordinary tests of significance are invalidated if successive observations 
are not independent of one another. The serial correlation coefficient has been 
introduced as a measure of the relationship between successive values of a 
variable ordered in time or space. Interest in the serial correlation problem was 
stimulated further by the new concepts of time series analysis discussed by 
Wold [8]. 


We shall define the serial correlation coefficient for lag L and N observations 
to be 


URy = WEN = XiXun + Xa Xue + +++ + Xv Xi — (2Xi)'/N 
Vy DXi — (2X.)°/N ; 


where C and V are the covariance and variance respectively and the X’s are 
considered to be independently normally distributed about the same mean with 
unit variance. If the population variance were known a priori, the variates 
could be transformed so that they would have unit variance; under such an 
unusual circumstance, the only distribution required would be that of the serial 
covariance. ‘Tintner has given a test of significance for the serial covariance [6] 
and for the correlation coefficient [7] by using a method of selected items. The 
author has presented the distribution of the serial covariance and of the serial 
correlation coefficient not corrected for the mean in a recent doctoral thesis [1]. 
The distributions of ,.Ry not corrected for the mean will be mentioned in the 
sections which follow. 


2. Small sample distributions for lag 1. W.G. Cochran has suggested that 
we use a result given in his article on quadratic forms to derive the distributions 
of the serial correlation coefficient for small samples [3]. If X:, X2,---,Xw 
are independently normally distributed with variance 1 and mean 0, then 


“Every quadratic form 2a;;X;X; is distributed like >> \xu , where r is the 
k=l 


rank of the matrix, A, of the quadratic form, the w’s are independently 
distributed as x’, each with 1 d.f., and the \’s are the non-zero latent roots 
of the characteristic equation of A” [8, p. 179]. 
If each \; appears k; times as a latent root, u; will be distributed as x’ with k; 
degrees of freedom. 


1 This circular definition of the serial correlation coefficient was suggested by H. 
Hotelling. 


1 


R. L. ANDERSON 


If we set L = 1 in the above definition of the serial covariance, we note that 
the characteristic equation of ;Cy is 


= 2 me an | 
Qn QA & 
iF'y _ . 


Qo G3 & °°" ay 


where a; = —(A + 1/N), a2 = ay = (N — 2)/2N, and all other a’s = —1/N. 
The determinant can be evaluated by the method of circulants. We find that 


K=l 


N N 
1F'y(A) = II { owt , where uw, is the kth root of unity. Hence, 
t=_—l 


N - N—1 
iF'y = TI {-(» + x) + a (on + wy) = x dX ot, 


t=3 


ae —(w, +1 + ws), fork * N 
i (N — 3), fork = N 


Tl 7 2rk 
Fy = I {—Ak + (we + wy )/2} = II {-e + cos +} «a 


N 


K=1 


Hence = cos 2, (k = 1,2,---,N — 1), and 
$(N—1) 
AK Uk , for N odd, 


K=l1 
4(N—2) 
l AteU — U, for N even, 
K=l 


iC = 


where wu, is distributed as x’ with 2 df. and u with 1 df. At the same time, 
we note that Vy = =(X; — X)’ is distributed as x’ with N — 1d. 

The general procedure in deriving the distribution of :Rw is as follows: We 
determine the joint density function of the u’s which form the distributions of 
1Cy(= 1Ry-Vw) and Vy. The w’s.are integrated out, leaving the joint density 
function of :Ry and Vy. The distribution of :Rw is obtained by integrating 
with respect to Vy from 0 to «. As examples, derivations of the distributions 
of ,R. and ,R; have been included. In order to simplify the results, the first 
subscripts have been dropped from Ry . 

Distribution of Re. ReVe = Avi + Ade — wu and Ve = uw + Ue + u, where 
u; and u, are distributed as x’ with 2 df. and uw with 1 df. and \,; = } and 
he = —#. Hence the density function of the u’s is 


D(ur, U2, u) = (49/24) ute *”*. 








t 





SERIAL CORRELATION COEFFICIENT 3 


Since u: = [Ve(Re — Az) + u(1 + As)]/(Ar — As) and 
ue = [Ve(Ar — Re) — u(l + Ax)]/(Ar — Aa), 


u must vary between 0 and Ve(A1 — Re)/(1 + A) for Ax < Re < Ax and between 
Ve(A2 — Re)/(1 + Az) and Ve(Ar — Re)/(1 + i) for —1 < Re < re. After 
integrating with respect to u between these limits and then with respect to Vz 
from 0 to ~, we obtained the following density function for Re : 


V (Ai — Re) 
| Va+m) —m)’ acacia 


3 
D(Rs) = = aah a 
‘| _V=-R) 4 VO-B) tec R<m. 
V (LA) Or — 2) A (1 Ae) (Ae — i) ae 
The cumulative probability function has the same general form: 
(a. — R’)! (A: — R’)! 
——utaeiananen > o——ieninme fey ~| <i < 2 
V+) Or— ms) VM) Oe — dr) 
P(Re > R’) _ (r R’)! 
_ 


ee for As < R’ < dX 
V+) Oi — &) : 

Distribution of Ri. RiVz1 = At + Agta + Astis and Vz = uw + Ue + Us, 
where each 1 is distributed as x’ with 2 d.f. Hence, 


_ VilRr — de) + uslA2 — As) _ Vi — Rr) — usr — As) 
y= ee os x) : and U2 = oothemnengaenee tarot 0, - ry) : ° 


For 2» < Rr < Mu, O < us < Vili — Rr)/(r — As); for As < Rr Se, 


Vi(d2 — Rz)/(A2 — As) S Us < Vr(Ar — Rz)/(Ar — As). Using these limits, we 
derived the following density function for R; : 


(1 — Ry) (4: — Rr) 
how aw) timo «OSB Sm 
D(R:) = 2 ’ 
(A. — Rz) 


Bo ford\. < Ri < A. 
(1 = Az) Qa — 2s) — 
The cumulative probability function is similar, except that the coefficient 2 
cancels and the exponent of each numerator is raised by one. 

General formulas for N odd. It appears that the density function for Ry and 
Vw for N odd is 


D(Rw, Vx) = KV" €" (i — Ry) Jos for Ami S Rw < Am; 
i=l] 


4(N—1) 


where a; = []’ (A; — ;) forj ¥ i and 1/K = 2~-’r[}(N — 38)]. This 


j=l 





2 Note that we are omitting the lag subscript from iRy. 





4 R. L. ANDERSON 


formula holds for N = 5 and 7; we will show that it holds for N + 2, assuming 
it true for N. If wesetk = 3(N +- 1), Rwi2Vnae = RnVw + Aux and Vyie = 
Vu + ux ; hence, 


- (Ry+e Vie —~ i Ur) 


and Vy = Vu+e — Uk. 
Vu+2e — Ux 


Ry 


If we make the substitution u, = u,Vy42, the density function for uz, Vise, 
and Ry. is 


sKVil,” e tYN+s a (Ai — Ryse) — ui — he) POY Jas. 





In order to obtain the distribution of Vv42and Ry42 , we must integrate out u; . 
The limits of integration differ for different values of m. We note that 


uz = (Rw — Rw42)/(Rw — ®); 


except that u, = 0 when x < Ry < Amsi, Since Ama: < Rwae < Am and u; 
can not be negative. For Rvi2 > \%, Uz < 1; hence, if Ry is replaced by a 
larger (smaller) quantity, wu; will be larger (smaller). 

For m = 1(\2 < Rwi2 < Ax), we need to consider only that region for which 
Ne < Ry <1. In this region, 0 < uz < (Ar — Rwa2)/(M1 — Xx) and the density 
function of Ry42 and Vw4e is 








Ly 


a 





SERIAL CORRELATION COEFFICIENT 5 
(N—3) ;_! 
o(Vwae)(r — Ruas)* /o, 


K 
where ¢(Vwae) = Vite? e t78t2/204* .r12(N — 1)] anda; = [J (A. — A). 
j =n 2 


i= 
For m = 2(A3 < Rws2 < Ae), we must consider two regions in the Ry plane. 
When 2 < Ry < A, 


— Rye < UL < Ai ok Revs 
* —h& Ai — & 


and when \3 < Ry < 2,0 <u < (we — Ry+2)/(A2 — Ax). If we combine the 
density functions for these two regions, we find that 


2 
D(Ry-2, V we) = o(V we) a (rv es Ruse)" fa; forAs < Rw+2 <)e. 


Similar results can be obtained for the other regions. 
Finally we conclude that for N odd, 


D(GRy) = 3(N — 3) 2 (ri a Ry) Jax for Am+i S iRy < Am 
and 
PGRy > R’) = AG — RYO fe for Ams < R’ < Am, 
i=l 


$(N—1) 
where a; = []’ (A: — ,),i J. The general density function for N odd and 


j=l 
iRy not corrected for the sample mean is [1] 
(N—1) 


DGRy) =3(N — 2) i > (Ry — rs)" /az for Am < Rw < Awa, 


i=—m 


(N-1) 
where a; = Il’ (Aj - Vv (1 —\),t #j. 


General formulas for N even. Using the same method as above, we can show 


that the same formulas hold for N even and ,:Ry corrected for the mean except 
4(N—2) 


that in this case a; = I’ (Ai — AJA; + 1) (4 + 1),j #7. No general formulas 


j=l 


were derived for N even and ,Ry not corrected for the mean. 


3. Large sample distributions for lag 1. The simultaneous density function 
of C and V, where we will drop the subscripts for convenience, is 


D(C, V) = (2n)" [ L $(8, the” ds dt. 


$(s, t) = K [ ves [ Maxax, +++ dXy, 








6 R. L. ANDERSON 


where 9 = {2X{ — 2e(2(X; — X)"] — 2s[XiX2 + --- + XwXi — (ZX,)’/N]} 
and s and ¢ are pure imaginaries. 

¢(s, 1) = A’, where A is the determinant of the quadratic form 6. This 
determinant was evaluated by the method of circulants; we found that A = 


N-1 


I] {1 — 2(¢ + sr\)}, where %& = cos 2xk/N. 
kewl 





i 43 
Set K = log ¢(s, t) = xi; Gt If K is expanded in series, we find that «;; = 


N—1 
m!2™ Zz. Ai, where m = (i + j — 1). For N > i, we might indicate these 
k=l 
summations: 2. = —1, ZA; = 3(N — 2), ZAR = —1, DAE = 3(3N — 8) and 
DME = —1. Hence xo = E(C) = —1, «. = E(V) = (N — 1), xo = of = 


(N — 2), ke = of = 2N — 1), eu = poo, 
Kan = 4(N = 2), K2 = —8, etc. 

If we let C’ = C + 1 and V’ = V — (N — 1), all of these semi-invariants 
will remain unchanged except that xi = xo. = 0. Since R = C/V, 


1 \_ c(N-1)4+V' 
(8+ 945) = pe 


_ CW -1D+V' Jea,_ °( v’ y 
— Ww 1 (d waa) f- 
If we neglect terms of order less than 1/N, E(R) = —1/(N — 1), E(R — R)’ = 
ae and E(R — R)* = Ofork > 2. For N < 75, a more exact approxi- 
mation may be desired. 

If the above approximation is used, i1Ryw is normally distributed with mean 
—1/(N — 1) and variance (VN — 2)/(N — 1)’. The single-tail significance 
points can be found by substituting in the formulas 


—2, Kx = —8, ‘Kos = 8(N = 1), 


‘Ru(.05) = —1 + 1.645+/(N — 2) or iRy(.01) = —-1+ 2.326-V/(N — 2) 
N-1 N-1 

Refer to Fig. 2 for a comparison of the exact distribution and the normal ap- 

proximation for N = 15. I have included the graphs of the exact distributions 

for N = 6 and 7 in Fig. 1. We might note a few comparisons between the 

approximate significance points and the exact ones: 


Positive tail Negative tail 
N 5% 1% 5% 1% 
Exact | Approx.| Exact |Approx.| Exact | Approx. | Exact | Approx. 


ee ee ee 


45 | 0.218 | 0.223 | 0.314 | 0.324 | —0.262) —0.268) —0.356) —0.369 
75 =| 0.173 | 0.176 | 0.250 | 0.255 | —0.199) —0.203) —0.276| —0.282 











“jut = OD 


- 


' ts & 





SERIAL CORRELATION COEFFICIENT 7 


2 
For Ry not corrected for the mean, it was found that y = i Ny 
1 + 2:R% 


asymptotically normally distributed with mean 0 and variance 1 [1]. 


was 


4. Significance points of :\Ry. An example of the methods used in tabulating 
these significance points has been presented in the author’s doctoral thesis [1]. 
The significance points for the values of N enclosed in parentheses have been 
obtained by graphical interpolation. Note that N is the number of observations 
(see Table I). 


—— fxact Distribution of ,Rig 
----- Normal Approxtmation 





> 


235@&#@ FS 6. 





710-9 -8 -J -6 *5 -4 73 -Z2 





io 
Fic. 2 


5. Distributions for general lag, L. (a) Introduction. For a general lag, L, 
the constants in the characteristic equation for the covariance ,Cy are a, = 
—(A + 1/N), dra: = Gv-141 = (N — 2)/2N and all other a’s = —1/N. Hence 
the characteristic equation is 

N-1 


Fy = I] [\z — cos (2eLk/N)] = 0. 


Certain important generalizations concerning ,Fy may be set down: 

1. When L is not a factor of N or has no common factor with N, .Fy = iF yw. 

2. When L and N have a common factor, a, .Fw = (iF wja)"(A — 1)*”. 

2a. If a = L, .Fw = (xF,)"(A — 1)", where p = N/L. 

The proof of the first statement was suggested by Cochran. Since 
cos (a + 2ar) = cos a, where a is any integer we must prove that the series of 
numbers 










R. L. ANDERSON 






L, 2L,---,(N — 1)L, 
when reduced modulus N can be arranged to form the series 
ae » (N ~ 1). 


This proof can be found in most books on the theory of numbers; e.g. [4]. Hence 
we conclude that each term of the sequence {cos (2xLk/N)} reduces uniquely 



































































TABLE I 
» Positive tail Negative tail 
57% 1% 5% 1% 

5 0.253 0.297 —0.753 —0.798 

6 0.345 0.447 0.708 0.863 

7 0.370 0.510 0.674 0.799 

8 0.371 0.531 0.625 0.764 

9 0.366 0.533 0.593 0.737 
10 0.360 0.525 0.564 0.705 
11 0.353 0.515 0.539 0.679 
12 0.348 0.505 0.516 0.655 
13 0.341 0.495 0.497 0.634 
14 0.335 0.485 0.479 0.615 
15 0.328 0.475 0.462 0.597 
20 0.299 0.432 0.399 0.524 
25 | 0.276 0.398 0.356 0.473 
30 0.257 0.370 0.325 0.433 
(35) 0.242 . 0.347 0.300 0.401 
(40) 0.229 0.329 0.279 0.376 
45 0.218 0.314 0.262 0.356 
(50) 0.208 0.301 0.248 0.339 
(55) 0.199 0.289 0.236 0.324 
(60) 0.191 0.278 0.225 0.310 
(65) 0.184 0.268 0.216 0.298 
(70) 0.178 0.259 0.207 0.287 
75 0.173 0.250 —0.199 —0.276. 

to one of the sequence {cos (2rk/N)} for k = 1, 2,--- ,(N — 1), when L/N 


is a prime fraction. 
If L and N have a common factor, a, L = ga and N = pa, where p and q 
are integers prime to one another. Hence, 


pa-—l p—l a 
LF’ II {ns — cos azz = [J (. — cos axt) (A — cos 2x)*" 


k=l k=l 
GF,)(A — 1)*"* = 0. 
If «@ = L, Fy = (:F,)"(A — 1)”, where p = N/L. 





SERIAL CORRELATION COEFFICIENT 9 


When these results are applied to the large sample distribution of ,Rw , we 
find that it is independent of L. For the more important case in which p = N/L, 
the semi-invariants «;; for C and V are exactly the same for all L with a given N. 
We see that 


p—l 

K, = —3L > log {1 — 2(t + sdx)} — 2(L — 1) log {1 — 2(t + 8)}, 
k=1 

p—-1 

>a + 1) _ i, But 


k=l 


where \; = cos (2xk/p). Hence, «;; = m!2” x ( 
p—l 


>. Xi + 1 is always 0 or a multiple of p when p > i; therefore, the p’s cancel 


k=1 

and «;; is the same for all p or for all L, since L = N/p. When p < i, the 
xi;’s will not be equal for all p. For example ko = 2(N —.1) for p = 2 and 
k3 = 2(N — 4) for p = 3. 

(b) Distributions of .Ry when N/L = p. These results indicate that the 
distributions of the serial correlation coefficients for which the number of ob- 
servations is divisible by the lag, so that N/L = p, would include the distribu- 
tions of all the serial correlation coefficients regardless of the values of N and L. 
We will designate any lag L as the primary lag for a given N if N/L = p, an 
integer. For example, 2. and .R, have the same density function, but we will 
derive only the density function for lag 2, which we will call the primary lag. 
The case of p = 1 is trivial, since it involves correlating a series with itself. 
To date, we have derived the exact density functions for p = 2 and p = 3 
and the required integrals for p = 4. The significance points have been tabu- 
lated in Table II. For simplicity of notation, we will set .Ry = .R, and 
Vy = v; 

Case p = 2(N = 2L). .R2V = —w + uw and V = uw + uw, where y is 
distributed as x” with L d.f. and uw as x’ with L — 1d. Hence, 


Di(u ; U2) = K(u)*2 (u)*, 


where 1/K = 24 ?r(4L)r[i(L — Je”. After substituting uw, = V(1 — .R2)/2 
and uw. = V(1 + _R2)/2 and integrating with respect to V from 0 to ~, we have 
(1 — Re)? + R)™ 
2*-1B(3L, 3(L — 1)] 

If we set (1 — ,R2) = 2y, then the cumulative probability function is 

1 4(1—R’) 
B[ZL, 2(L — 1)] Jy=0 
Pearson has tabulated the values of these incomplete Beta functions [5]. In 
his notation, P = I,[3L, 3(L — 1)], where x = 3(1 — R’). For ,R:2 not cor- 
rected for the mean, P = [,(3L, $L) [1]. 


Case p = 3(N = 3L). .R3V = —3u, + u and V = uw + u, where wy is 
distributed as x” with 2L d.f. and u with L — 1d.f. Therefore, Di(u, u) = 


D(iR2) = 


P(.R; > R’) = pe — yh dy, 








10 R. L. ANDERSON 


Kut ul, where 1/K = 2*“-?r(L)rfA(L — 1)Je””. After substituting 
u, = 2V(1 — .R;)/3 and u = V(1 + 2,R;)/3 and integrating with respect to 
V from 0 to ~, we find that 


2°(1 — 1Rs)*(1 + 21Rs)** 
3G) BIL, AL — 1)’ 


If we set x = 2(1 — R’)/3, P(.R; > R’) = I,[L, 3}(L — 1)]. For 1R; not cor- 
rected for the mean, P = [,[L, $L). 

Casep = 4(N = 4L). :RGVV = —ue + wg and V = uw. + wm + u, where wz 
is distributed as x’ with L df., uw with L — 1df.andu with 2L df. The density 
function of the w’s is Di(u2, uw, u) = Ku ?ub ui te"? where 1/K = 
2“) raL)ris(L — 1))P(L). Since wm = [V(1 + 2Ry) — ul/2 and w = 
[V(l — Ra) — ul/2,0 < u < V(1l — LR, for .R, > Oand 0 < u < V(1l + 1R,) 
for .R, < 0. For .R, > 0, 


KVe pvi-uke) (1-3) (1-2) | L—1 
D(LR,) = gheL-9 [. [V(1 + 1R,) — u] i [Vi — Ry) — u] (2-2) 1 du, 


D(iRs3) = LR > —}, 


For .R, < 0, D(.R,) is the same except that the upper limit for the integral is 
V(1 + .R,). If we make the substitution y = u/(upper limit) in each case and 
then integrate with respect to V from 0 to ~, we have these density functions: 


1 
(1+ LR.) ‘. y* (1 —y) (1 — 2a) —y(1 + Rad) dy, 
for LR, < 0, 
D(iR,) = k- ' 4 
(1 Ry? Py — 1 + 2) VC — ROP a, 


( for .R, > 0, 
where k = I'[}(4L — 1)]/2*°**-1T(L)-P(4L)-TR(L — 1)). 
The probability integrals must be evaluated for each L. The cumulative 
probability functions for L = 2 and 3 are: 
/2 (1 + rR’)? ess R?(5 + R')/V/2, for R’ > 0, 
PUR, > R’)=1- —- - 
2 111+ R’)*”, for R’ < 0, 
(1—R’)?, | for R’ > 0, 
(1 — R’)*” — (—R’/2)°"(22R” + 36R’ + 126), for R’ < 0. 


Since the density functions are much simpler for R’ > 0 when L is odd and 
for R’ < 0 when L is even, we have derived only these significance points for 
L > 3 and interpolated for the intermediate points. It was noted that the 
significance points approach those given in Table I for the first lag. For these 
comparisons, see Table III below. Note that for L > 7 the 5% points are 
almost identical and the 1% points are nearly accurate to two decimal places. 


P(.Ry > R’) = “7 





wTewa wo *st 





SERIAL CORRELATION COEFFICIENT 1l 


TABLE II 
Significance points of Rw for p = 2 and #* 
p=2 (N=2L) p=3 (N=3L) 
I Positive tail Negative tail Positive tail Negative tail 
tt Nn tia At tan ise etca ieee caalkiie teai cent tinea tas 
2 | 0.805 | 0.960 | —0.99 | —1.00 | 0.488 | 0.762 | —0.496) —0.50 
3 | 0.729 | 0.907 0.928) 0.994) 0.447 | 0.677 0.474; 0.496 
4 | 0.vuv4 | 0.852 0.848) 0.950) 0.406 | 0.610 0.439) 0.480 
5 | 0.612 | 0.802 0.773; 0.902) 0.373 | 0.559 0.406, 0.461 
6 | 0.571 | 0.759 0.712} 0.856) 0.346 | 0.518 0.377; 0.440 
7 | 0.536 | 0.721 0.662} 0.812) 0.324 | 0.485 0.354; 0.420 
8 | 0.507 | 0.688 0.620; 0.774) 0.306 | 0.457 0.334; 0.402 
9 | 0.483 | 0.659 0.585} 0.739) 0.291 | 0.433 0.316} 0.387 
10 | 0.462 | 0.634 0.E 0.708) 0.278 | 0.413 0.301; 0.373 
12 | 0.428 | 0.590 0.505) 0.656) 0.256 | 0.380 0.276, 0.347 
14 | 0.399 | 0.554 0.467; 0.612) 0.239 | 0.353 0.256, 0.326 
‘16 «| 0.376 | 0.523 0.436, 0.577) 0.225 | 0.332 0.240) 0.308 
18 | 0.357 | 0.498 0.410) 0.546) 0.213 | 0.314 0.227; 0.293 
20 | 0.340 | 0.476 0.389} 0.520) 0.202 | 0.298 0.215) 0.280 
25 | 0.308 | 0.432 0.347; 0.469) 0.182 | 0.268 0.193; 0.254 
30 | 0.282 | 0.398 0.317) 0.431) 0.167 | 0.245 0.176; 0.234 
40 | 0.247 | 0.348 0.273} 0.374) 0.146 | 0.212 0.153) 0.205 
50 | 0.222 | 0.314 | —0.243) —0.335) 0.131 | 0.191 | —0.136| —0.184 
TABLE III‘ 
Significance points for p = 4 
Positive tail Negetive tail 
L N 5% 1% 5% 1% 


Exact |Table 1| Exact |Table 1) Exact | Table1| Exact | Table 1 


_—_— | —_—_—_ | —_—_—_—__—__ | — | | | | | | 


8 | 0.373 | 0.371) 0.618 | 0.531;—0.653 |—0.625|—0.818 |—0.764 
12 | 0.353 | 0.348) 0.547 | 0.505) 0.528 | 0.516) 0.692 | 0.655 
16 | 0.325*| 0.322) 0.490*| 0.466) 0.451 | 0.447) 0.604 | 0.580 
20 | 0.301 | 0.299) 0.451 | 0.432) 0.402*| 0.409) 0.543*| 0.524 
24 | 0.281*| 0.280) 0.419*| 0.404) 0.365 | 0.363) 0.497 | 0.482 
28 | 0.264 | 0.264) 0.392 | 0.380|—0.338*| —0.337|—0.460*|—0.448 


* L is the lag and p = N/L. 
‘ * indicates interpolated values. 

























12 R. L. ANDERSON 






Case p > 4. We have not set up any of the density functions for p > 4; 
however, it appears that the significance points given for lag 1 would be ac- 
curate enough for the higher lags. The exact significance points for lag 2 have 
been derived for p = 5 and 7. The reader may note the close approximation 
given by the significance points for lag 1 when p = 7. We hope to check the 
lag 1 approximation for other lags in the near future. 


TABLE IV 
Some significance points for lag 2 


Negative tail 


5% | 1% | 5% | 1% 
p = 5(N = 10) 


| Positive tail 


ere 0.342 
ice rnnae-ea ud od 0.360 





0.540 | -0.417 | —0.595 
0.525 ~0.564 | —0.705 


p =7(N = 14) 


eae gees eee eee eee ae Sean eee ger erent s eee a ene ag ea 
Exact................/ 0.885 | 0.482 | -0.479 | —0.616 
EE ee 


0.335 | 0.485 | -0.479 | —0.615 


7. Summary. 1. The exact and large sample distributions have been derived 
for the serial correlation coefficient for lag 1 and the exact significance points 
tabultaed for N, the number of observations, up to 75; for N > 75, the large 
sample approximations can be used. 

2. It has been noted that the distributions for any lag L are the same as those 
for lag 1 when L and N are prime to each other. In general the distribution of 
the serial correlation coefficient can be derived for any L and N by using only 
those distributions for which L is a factor of N. The distributions and signifi- 
cance points have been derived for N/L = p = 2,3 and4. Forp>4(N >4L), 
the significance points given for lag 1 probably can be used when L is greater 
than 4 or 5. The accuracy of this approximation has been checked for lag 2. 

3. These significance points should be useful in determining the methods of 
studying a time series, as suggested by Wold, and in the formulation of a better 
test of the significance of regression coefficients when we know that the observa- 
tions are correlated in time. In addition, we now have a method of testing our 
assumptions of independence for any set of data. 


REFERENCES 


{1] R. L. ANpERson, Serial Correlation in the Analysis of Time Series, unpublished thesis, 
Library, Iowa State College, Ames, Iowa, 1941. 

[2] M. S. Bart.ett, ‘‘Some aspects of the time-correlation problem in regard to tests of 

significance,’’ Roy. Stat. Soc. Jour., Vol. 98 (1935), pp. 536-543. 




















SERIAL CORRELATION COEFFICIENT 13 


[3] W. G. Cocuran, “Distribution of quadratic forms in a normal system with applications 
to the analysis of covariance,’’ Camb. Phil, Soc. Proc., Vol. 30 (1934), pp. 178-191. 

[4] L. E. Dickson, Modern Elementary Theory of Numbers, U. of Chicago Press, 1939. 

[5] Kart Pearson (Editor), Tables of the Incomplete Beta-Function, Cambridge U. Press, 
1934. 


[6] G. TivtneEr, ‘Tests of significance in time series,’’ Annals of Math. Stat., Vol. 10 (1939), 
p. 141 ff. 


[7] G. TrntNER, The Variate Difference Method, Principia Press, Bloomington, Indiana, 
Appendix 5B, 1940. 
~(8] H..Woxp, A Study in the Analysis of Stationary Time Series, Almquist and Wiksells 
Boktryckeri A. B., Uppsala, 1939. 


[9] G. U. Yuue, “‘On the time-correlation problem,’’ Roy. Stat. Soc. Jour. Vol. 84 (1921), 
pp. 496-537. 








SERIAL CORRELATION AND QUADRATIC FORMS IN NORMAL 
VARIABLES’ 


By TJALLING KoopMANS 


Penn Mutual Life Insurance Company 


1. Estimation problems of stochastical processes. In regression analysis of 
economic time series a situation often arises in which a certain observed quan- 
tity represents a “dependent” variable at one time and an “independent”’ vari- 
able at a later time. For instance, the following relations may exist between 
the price x, and the supply y; of hogs at any time ?: 


Xr =a— By: + 2, 
y= ytorrteZ. 


The first of these equations expresses the price-depressing influence of large 
supplies. The second equation expresses the supply-stimulating influence of 
high prices one time unit (in the case of hogs, about 18 months) earlier. The 
terms z, and z; represent influences of additional variables and/or random dis- 
turbances. Elimination of y; leads to 


(2) m=e—{Maite. 


The statistical estimation of the parameters e and ¢ of such an equation is 
usually attempted by the ordinary least squares method, disregarding the fact 
that the observation x, is both a dependent variable at time ¢ and an inde- 
pendent variable at time ¢ + 1. The following simple example shows that this 
may lead to erroneous results particularly in small samples. Suppose that e = 0, 

= —1, and that z, is a purely random variable with mean 0, while only three 
successive observations are available. The least squares estimate of 7 is then 
given by the slope of the straight line connecting the points (x , x2) and (22 , 23) 
in the plane of z,;,; and z,. This slope, however, has an expected value 0, 
because according to our assumptions the conditional expectation of x3 for a 
prescribed value of x2 is equal to x22, whatever value that is. Thus the least 
squares estimate of £ = —1 has an expected value 0 showing an important bias. 

Mathematical business cycle theories utilize systems of equations much more 
complicated than the example considered [1]. The common feature of these 
equation systems is, however, that-they reduce fluctuations in a set of economic 
variables to 

1. earlier fluctuations in the same set of variables, 

2. changes in given non-economic or external variables, and 

3. random disturbances. 


(1) 


1 This investigation was carried out at the Local and State Government Section (Prince- 
ton Surveys) of the School for Public and International Affairs of Princeton University. 
The main results were presented to the Chicago meeting of the Institute of Mathematical 
Statistics in September 1941. 


14 








of 
e 
~= 





SERIAL CORRELATION 15 


An equation system of this type has been said to define a stochastical process 
in a number of variables [2]. The statistical testing of mathematical business 
cycle theories accordingly requires a theory of estimation of the parameters of 
stochastical processes. The operation of stochastical processes is also apparent 
in meteorological data. Assuming a normal distribution for the random dis- 
turbances, it will be seen that the mathematical prerequisite for an estimation 
theory of stochastical processes is the study of joint distributions of certain quad- 
ratic forms in normal variables. 

In this article only the very simplest problem of this class will be treated, 
namely that of testing the significance of ¢ in equation (2) if it is known that 
|¢| < 1 and that ¢ is equal to zero. This is the problem of testing the signifi- 
cance of single serial regression, or of single serial correlation, because the dis- 
tinction between single regression and correlation coefficients disappears in this 
simple case for coefficients absolutely smaller than unity. 

In the next section the problem of estimating single serial correlation if the 
mean is known will be stated and the difficulties involved will be discussed. In 
section 3 a conditional distribution of a quadratic form in normal variables will 
be derived. The proof in section 3 covers only forms in five or more variables, 
but another proof covering any number of variables is given in section 4. This 
distribution is then applied to devise a test of significance of serial correlation 
in section 5. The reading of section 4 is not necessary for the understanding of 
section 5." Readers desiring to locate only the main results can read those from 
equations (3), (11), (16), (21), (36), (61), (62), (74), (79), (82), (92), and (96). 





2. The estimation of serial correlation. In the stochastical process 
(3) Zt = plrit z, 


where the z, are independent drawings from a normal distribution with mean 0 
and standard deviation o, the parameter p may have any positive or negative 
values. The process will only be a stationary one if 


(4) |p| <1. 

For, since 

(5) Ex, = Exc = Ez, = 0, Ezi = o’, 
and 

(6) Ext = pEziito’, 


a variance of x; independent of ¢ will be possible only if (4) is satisfied, in which 
case 


(7) Ex? = —* 























16 TJALLING KOOPMANS 





If (4) is not satisfied, however, Ex? will be an increasing function of ¢ tending to 
infinity in approximately geometric progression if ¢ exceeds any limit. In this 
article the limitation (4) will be imposed a priori. 

It follows from (3), (7), and the assumptions regarding z,, that the joint 
distribution of the quantities 2 , 22, 23 --- zr is given by 


7 
8 aes 2\4 ie ‘. 1 (T-1) —} 2 22/02 
( ) mat e ka p2)x?/a2 | e ‘=? dz, dze aie dzr. 


21a? 

Since the Jacobian of the transformation (3) from the variables 2, 2 --- zr 
to the variables x; , x2 --- x7 equals unity, the joint distribution function of the 
T successive observations 2 , 12 --- x7 that make up a sample is found simply 
by replacing the z, in (8) by the corresponding expressions in the x,;. This 
leads to the distribution 


es *)! —3[l—2pm+ n)/o 
(9) oan go NitomttiteMalie® de da, +++ dite, 
in which the three quadratic forms 
l=ai+ 2%, 
(10) M = XXo + Xo, + -°+ + Xr-idtr, 


2 2 2 
= Io 2g tos + Ze-1, 


are the only characteristics of the sample that enter. In other words, 1, m 
and n are jointly sufficient statistics for the estimation of p and o. It may be 
noted that these statistics remain the same if the series of observations is taken 
in inverse order. 

It seems natural to attempt maximum likelihood estimation of p and o, even 
if the usual optimal properties of estimates so obtained have so far not been 
proved for stochastical processes. Straightforward calculations lead to the 
following third-degree equation for the maximum likelihood estimate p of p: 


(11) (m — pn)(1 — p) — pl — 26m + (1 + p)n] = 0. 


Of course the root asymptotically approaching m/n has to be selected. The 
corresponding maximum likelihood estimate ¢ of o is given by 


: 
, 


9 
«< 


(12) 


> 


(2 — 2m + (1 + p’)nl. 





In view of the complicated definition of p it seems desirable as a first step 
to derive from (9) the joint probability distribution of l,m andn. This requires 
a transformation of the volume element dz; --- dxz in (9) to the form 


(13) ¢(l, m, n) dl dm dn, 

















SERIAL CORRELATION 17 


which it assumes after integration over T — 3 other coordinates the variation 
of which does not change I, m and n. 
Since this is purely a problem of integration completely defined by the ex- 


pressions (10), the resulting function ¢(l, m, n) is independent of p and ¢. The 
joint distribution 


2\h 


@ air genre op m, n) dl dm dn 
To~ 


of 1, m and n will thus be known for any values of p and o as soon as it is known 
for two particular values. 

If as particular values we choose p = 0 and o = 1, the x; become identical 
with the z,, and the problem is that of finding the joint distribution of the 
quadratic forms (10) in independent normal variables with mean 0 and vari- 
ance 1. Even if so simplified, the problem is a complicated one. While there 
are infinitely many common sets of principal axis of the forms / and n, none of 
these sets of axis has a single axis in common with m. 

Although no solution is offered for this problem, the following suggestion may 
be ventured. Once ¢(I, m, n) is known, the mathematically simplest procedure 
for interval estimation of p might well be one that confines attention to samples 
having the same values of | and n as the sample actually obtained. Suitably 
chosen percentiles of the conditional distribution of m with | and n fixed at the 
observed values, would be convertible into confidence limits for p with the 
help of (11). 

A simpler mathematical problem is encountered in testing whether the exist- 
ence of a difference between p and 0 can be established, or, in other words, in 
testing the significance of serial correlation. If p = 0, the distribution function 
in (9) depends only on p = 1 + n, not on l or n separately, and exact signifi- 
cance limits for m can be derived from the joint distribution 


(15) (2ra°)*7e*?'"*W(p, m) dp dm 


of p and m only. This distribution will be studied in the next three sections. 
It is hoped that the methods there applied will provide a useful starting point 
in the treatment of other problems of the class described in section 1. 


3. Distribution of a quadratic form in normal variables on the unit sphere. 
Consider two quadratic forms in T independent normal variables with mean 0 
and variance 1, 


p=uitat+-::-+a2r, 
(16) 2 2 2 
Q = KX + kote + +++ + Krtr. 


While the characteristic values of the form p are all coincident with the value 1, 
the characteristic values x, of g are provisionally supposed to be different from 
each other, so that they can be arranged in decreasing order: 











18 TJALLING KOOPMANS 





(17) Ki > ke > +s > kr. 
The probability density 

(18) (2m)? 

in the space of the variables is constant on any sphere 

(19) — = Po = constant, 


while the distribution function g(p) of p is that of the x’-distribution with T 
degrees of freedom 
4T-1 —tp 


te 
(20) g(p) = Fran)’ 















The hyper-surfaces on which the ratio 






(21) r= 


SIR 


of q to p is constant are cones with the origin as vertex dissecting the same 
proportion of the metric “surface” of each sphere (19). It follows that the 
conditional distribution function of r for a prescribed value po of p is inde- 
pendent of that value pp , and is therefore equal to the unrestricted distribution 
function h(r) of r. In other words, p and r are independently distributed. Their 
joint distribution being 


(22) g(p)h(r) dp dr, 
the joint distribution of p and g = rp is found to be 
(23) hp, a) dp dg = otp)h(2) dp = 9) n (2) dp de 


The function h( ) may therefore also be described as the conditional distribution 
function of g on the unit sphere 


(24) p=1. 


Since x; and xz are the extreme values of g under the condition (24), the function 
h(r) vanishes outside these limits. 

We shall now derive an expression for h(r) by comparing (23) with an ex- 
pression for f(p, q) obtained through the inversion theorem of characteristic 
functions. The characteristic function F(n, 6) corresponding to the variables 
p and q is 


(25) Fn, 0) = (2a) [eH dz, «+ dee = DX, 6), 


where, according to (16), the polynomial D(n, 6) is given by 





(26) D(n, 6) = I (1 — 2tn — 276x,). 













SERIAL CORRELATION 19 
It follows from the inversion theorem that 
(27) Hp, a) = (2x) [ [ °° DCy, 0) an ao, 


the order of integration over 7 and 6 being immaterial. 
Any elementary factor of D(n, 6) may be written 





(28) di(n,6) = 1 — 2in — 2id, = (1 — 2in) (1 =; aie). 
to . 





t/ /-2in on 


26 


Ficure 1. Paths of integration in the «-plane 


First considering the integration over @ (while 7 has some fixed value), we may - 


instead of @ use 











_ 1— 2m 
(29) ~ 2i6 
as an integration variable. The path of integration c, in the x-plane then is a 
straight line from 0 to — : = eo and another straight line from : a co 


back to 0, as indicated in Figure 1, and the transformed integral (27) runs 


ftp, q) = (ny? [| een — any tee feces 


{80-9 Ca) 














20 TJALLING KOOPMANS 


The integrand 


—(1—2i9) g/2e f _ kt “? a 
_ —é  - ie 


for the integration over « has singularities only in the points x = 0 and «x = x, 


t = 1,2--- 7. In order to simplify the argument we shall suppose that the 
quadratic form q is positive definite, or, in connection with (17), that xr > 0. 


The location of the singularities is then as pictured in Figure 1. At «x = « 
the integrand (31) is regular and of the order of magnitude of «~. Conse- 
quently a curve integral of (31) along the whole or any part of the circle |x| = R 


will tend to 0 if R tends to infinity. Using a theorem of Cauchy, it is therefore 
permissible in (30) to replace the described path c, by another path c,,z which 
starts out along c, from 0 up to — —— R, from there follows the circle | x | = R 


to the right over an angle 7 up to the point oe R, and from there returns 
t 


to 0 along c,—provided that R > «,. After reversing the direction in which 
the path is followed in order to do away with the negative sign in (31), the 
path so obtained can again be replaced by the path y,,¢ shown in Figure 1, 
which coincides with c, only up to a small distance d from the real axis, and 
encircles all singularities x, while retaining a distance d from the part of the real 
axis to the left of and up to x. Finally, a path of integration y’ independent 
of the value of 7 is obtained by going to the limit in which d = 0. This is an 
integration twice along the part of the real axis between 0 and x; , integrating 
from 0 to x; that branch of the integrand which is obtained by passing “‘under”’ 
each singularity, and going back from x, to 0 with the branch obtained by passing 
“around” x; and “over” each other singularity’. The integral so obtained con- 
verges at each singularity. This is also true for the singularity x = 0 because 
we are dealing only with positive values of g, which makes the exponential 
factor in (31) tend to 0 if x approaches zero. We shall now show that if in 
(30) the path y’ is substituted for c, (with a change in sign), the order of integra- 
tion over x and 7 can be reversed if 7 = 5. 
The integral over ny, taken from (30), 


(32) I= (2x) [ Gerry a Qin) tH dn, 


(in which x is now a positive real number), is by the substitution x° = p — q/« 
transformed to the integral encountered in the derivation of the x’-distribution 
(with 7 — 2 degrees of freedom) by the inversion theorem of characteristic 
functions. It may be quoted without proof (see [3] p. 42) that it equals 


(p oe q/x)™ ef(p-al«) ; 
(7 = “Sr-TGF 1)” if p — q/k 2 0, ork 27, 
) 


\I = 0, if p — q/xk £0, ork Sr. 


(33) 






2 For even values of 7 the parts of y’ for which « < «xr can be disregarded, because on these 
parts the same branch of the integrand is integrated in opposite directions. 












SERIAL CORRELATION 21 


It is necessary to observe, however, that the integral J converges uniformly for 
all real values of x whenever 7 = 5, because then 


(34) [11 = 2997 ae, 


is convergent. Because of this property, the reversal of the order of integra- 
tion is allowed for 7’ 2 5. 


If now in (30) we first carry out the integration over 7 and use (33), we are 
left with 


(35) f(p,q) = ——— _(e- ay" {TI ( 7 “ = 


where y, now is any curve proceeding from x = r into thelowerhalf-plane, 
crossing the real axis at a point x > x, , and returning to x = r through the upper 
half-plane, as indicated in Figure 2. (The path directly obtained is a path 7; 
consisting of twice the real axis between r and x , the branches of the integrand 
being taken as indicated by y,). Comparing (35) with (23) and (20), using (21) 





Figure 2. The integration path 7, 


and the well-known formula I(x) = (« — 1)I'(z — 1), we find the following 
expression for the distribution function of r: 


aa ae 4T—2 
(36) Mr) = 2 —+ | “= 


IT (« — «) (x — em 


This function vanishes for r = x,. In aie to arrive at a positive distribution 
function for xr < r < « that branch of the integrand must be selected which 
is positive for real values of x exceeding x . 

It is worth noting that the degree in «x of the numerator of the integrand is 
two less than that of the denominator. Owing to this fact, indeed, the distribu- 
tion function h(r) satisfies the two obvious conditions: 





(37) h(r) = 0 for r S kr, , h(r) dr = 1. 


For r S «rz the path of integration in (36) can be replaced by any closed contour 
enclosing all the singularities r, xr, --- , x: (r is a singularity only if T is odd). 
Taking as such a contour the circle | x | = R with R tending to infinity, we find 
that h(r) = 0 because the integrand is of an order x at x = ©. Further, if 
7, is again replaced by 7, which runs entirely along the real axis, 








TJALLING KOOPMANS 


1 x — r)** 
ia OL oa (x ae Ky)? a | ” 


aT wt : | II (x — Ki)? f (xk — rit ar| dk 


2771 Vig 


1 (x — ry” 


| 
| 
2ni Jy , Il (x — oo 


because the integrand in the last integral is of the order of x at the point 
K = @, 

The quantities r and x; enter into the right hand member of (36) only in the 
form of differences from the integration variable x. The addition of a constant 
e to both r and the x, will therefore merely result in a change of location of the 
distribution on the r-axis without a change in form: 


(39) h*(r + €) = A(r). 


This could be expected since such a transformation means the addition of ep 
to the quadratic form q studied. It follows that the validity of (36) is not 
limited to positive definite quadratic forms g, since any other quadratic form 
can be transformed to a positive definite form by this operation if a sufficiently 
large value of ¢ is taken. 

The function h(r) is a different analytic function between any two different 
successive characteristic values x; and x;.;. The expression (36) holds for even 
and for odd values of 7’, and is also valid for any number of coincidences in 
the set of characteristic values x,. It is true that integration along the paths 
7’ or 7, entirely coincident with the real axis, such as has been introduced in 
intermediate stages of the above proof, cannot be done if two or more of the 
x, coincide, because of divergence of the integral. Once (36) has been established 
for distinct characteristic values, however, it follows from considerations of con- 
tinuity that this result holds good also if coincidences occur in the set «; . 

The function h(r) has been studied by von Neumann [4] by an entirely dif- 
ferent and very ingeneous method for the special case that 7 is even while no 
two characteristic values are equal, and for the case that the characteristic 
values are equal two by two but otherwise different. The properties established 
by von Neumann, and some generalizations of these properties, can be derived 
from (36). If T is even, the derivative of h(r) of order 37 — 1 is 


(87 — 1)! (-1)"* 
— ee ee 


(30) a” Mo} I] iT — Ke /} 


™ ae not exist forr = «x,,t=1,2--- T, 


if Kt+1 =< P< Ke and t odd, 


= 0 for all other values of r. 








SS rl hlU€EEUllUl 


eg wr Wei TY 8ST 


.- a 


tee OO 





SERIAL CORRELATION 23 


If all characteristic values are distinct, all derivatives of an order lower than 
3T — 1 exist and are continuous everywhere. Generally, whether 7 be even 


3 
or odd, at a point where k characteristic values coincide 6) h(r) will exist 
and will be continuous if j S 3(T — k) — 3, and will not exist if 7 2 
4(T — k) — 1. 
If the characteristic values are pairwise equal, 


(40) ; Kes-1 = Kos = A, s=1,2,--- S, 


but otherwise distinct, their total number 7 = 2S must be even, and the only 
singularities of the integrand in (36) are poles at the points kx = A,. Accord- 
ingly the path of integration y, can be considered as a closed curve, and the 
integral in (36) can be replaced by the sum of the residuals of the integrand at 
all poles inside the curve: 


(A. a r)- - : 
(41) h(r) = (S — n> PT if \o41 <7 < A,,. 
Here P’(A) is the derivative of 

8 
(42) P(A) = IT (A — d.), 
its value in the point A = A, being 
8 

(43) Ps) = | | = [J a.— a). 

— As_jA=r, um 


(ues) 


For S = 2 this is simply the rectangular distribution 


(44) h(r) = Aw< r<dy. 


ha — De ? 

The numerical calculation of the distribution (36) with distinct characteristic 
values is extremely cumbersome except for very small values of 7. If the 
characteristic values x, follow some definite pattern, however, it may be possible 
in some instances to work out a reasonable approximation formula. ‘Two ex- 
amples of this type will be shown in section 5. 


4. Another proof that covers also cases with 7 < 5. The proof of (36) 
given above holds only for 7 = 5. Once the form of (36) is known or presumed, 
however, another proof of its validity is available, which has mathematical 
interest in itself, and covers all cases from 7 = 2 upwards. This is a proof by 
complete induction, based on the proposition that, if (36) holds for T variables, 
then it also holds for 7 + 1 variables. This proposition again rests on the 
recurrent relation 





TJALLING KOOPMANS 


») TGT + ¥) 
as)’ = ra@rany 


(r’ — try) * / ; (r — rr = ow hr(r) dr, 


if KT+1 <rv< Ki and Kr4+1 < Kr; 


proved elsewhere in this issue by von Neumann’ [5]. It connects the distribu- 
tion function h7(r) for T variables with the function hr4:(r’) obtained by the 
addition of one variable z74; and one characteristic value x74: . 

We shall substitute the “presumed” expression (36) for hr(r) with T 2 3 in 
(45) in order to show that the result for hr4:(r) is the same expression with T 
increased by one. In this proof if has for simplicity’s sake been assumed that 
the new characteristic value x74; is smaller than any of those already present, 
and that no two of the «x; ate equal. It is then possible again to select in (36) 
the path of integration y, which proceeds along the real axis from r to «: and 
returns along the real axis to r, passing each singularity in the same way as 7, 
does. If the integral (36) is substituted in (45) in this form, the order of integra- 
tion over x and r can be reversed, the result being 


, _ $7 — 1T(3T + 3) 
henlt’) = “S55 P@PGT) 


T a ae 
f [ATE = 0} fo = 2% = ana = 19a |e 


t=l 


(r’ = wos’ * 


(46) 


Writing for greater clarity x74, = a,r’ = b,x = c,r = z, we have to evaluate 
the integral 


Ira, 6, €) = [ (@ — a Me — bY Xe — 2)!" ae, 


(47) 
a<b<e,T 2 3, 


with the positive square roots taken if z is real and b < z < c. Suppose first 
that 7 = 2S + l1orodd. Then the integrand 


(48) sult) « @ - 0) —- We — 


has singularities at a, b and c, of which only those at b and c are of a type such 
that ¢2s4:(z) changes its sign if the argument z is turned once around the singular- 
ity. It follows that 


(49) 2Tes41 = / d2s+1(z) dz, 


the path of integration 6 being as indicated in Figure 3. For if the curve 6 iS 
contracted so as to run entirely along the real axis, from b to c and back to b, 
the two parts of the curve will each yield a contribution equal to J2s4:,, the under- 
standing being that positive square roots are taken when going from b to c. 

The integrand ¢2s4:(z) is regular at z = © and of order z’ in a neighborhood 
of that point. It follows that 


?I am greatly indebted to Professor von Neumann for communicating this relation to 
me before its publication. 








~~ 


eS 


is 
b, 


to 








SERIAL CORRELATION 25 


(50) —2lesy1 = J gasite) dz, 


where e, as in Figure 3, encloses the only singularity not enclosed by 6. Ina 
neighborhood of z = a the following expansion of ¢2s4:(z) holds: 


(51) — - (2 - | (2) (2 — by Xe — | 





s=0 s! 
The only term contributing to (50) is that with —S + s = —1. Since we 
selected a branch of ¢2s4:(z) such that (z — b) *(c - ze falls on the positive 
pure imaginary axis for real values of z below b, this term can be written 
a —1 a 0 om a —, s—i 
(52) (z a) (S — 1)! 3) (b a) *(c a) ’ 


where positive square roots should now be taken. The contribution of this 
term in (50) is 2 times the coefficient of (z — a)’, and therefore 


ade a" a — aye — a) 
lant = gay (2) (b — a) “(c — a) 





FicureE 3. The integration paths 6 and e 


3.3. oa (S — 3 ar a On 8 , = sii 
@ Sr = ( ; )(- 1 - aye = @) » 


_ 1S — 4)x — ay sti, — a\-e — h)* 
= Teorey © ~ Me — ae — By. 
Since T'(3) = nr, it follows that for odd values of T 


It is easily seen that the same relation holds good if T = 2S is even. In that 
case it follows from (47) that 


a s—1 
(55) (5) ” 
(S — 2)!(c — a) 3*He — by. 


In a manner similar to the transformations in (53) it can likewise be proved 
that the right hand member in (54) has the same derivative of order S — 1 


: (§ — 2)! [ “@- a) — sye@ 





26 TJALLING KOOPMANS 


with respect to c. It follows that the two members of (54) differ by a poly- 
nomial Q(c) in c of a degree at most equal to S — 2, the coefficients of which 
may depend on a and b. However, both members of (54) as well as their first 
S — 2 derivatives with respect to c vanish if c = b. Therefore Q(c) vanishes 
identically, and (54) holds for any integral values of 7’ not smaller than 3. 

Finally, if (54) is inserted in (46) an expression for hr,;(r’) is obtained which 
corresponds to (36) with 7 replaced by T + 1. 

It remains to prove (36) for some initial value of 7. For 7 = 2 the integral 
in (36) is divergent, but the form of h(r) is easily found directly. Writing 

22 2 _ g _ mizi + exh 

(56) Pp m+ t2, r p x + x ’ 
we find that 


22 2x2 


a(x, 2) _ | a(p, r) | in 
(p,r) Laan, a) a | 
| p p 


1 
— 4a Xo(K1 = Kk) oi 4(x —_ r)i(r — ko)? 
The probability density in the z,-z-plane is, of course, (2x) e”, but in 
making the transformation (57) a factor 4 must be applied to account for the 
fact that to given values of p and r correspond 4 sets of values of 2; and 22, 
differing in the signs only. This leads to the joint distribution of p and r 
lip dpar 
Qn (x1 — r)*(r — xe)?’ 


and, after integration over p, to 


(58) 


1 
(59) hele) = m(k, — r)t(r — xe)?’ 


= 0, if r < ke or Ki <7, 


ifm<r<m, 


in accordance with (39). 
Finally, if (59) is inserted in (45) with 7 = 2, the result is 


. 2 (r— ry 


(60) hslr’) = = tin a ae > 


dr, ke <r’ <x, 

if [ko , r’] denotes the largest of x. and r’. Writing « for 7, we find that this 
integral is equivalent to that in (36) for 7 = 3, taking into account the rule 
established for selecting the branch of the integrand in (36). For, taking the 
path of integration y, coincident with the real axis, the equal contributions from 
the two parts of the path between x2 and x; reinforce each other, while for r’ < xe 
the remaining contributions (intervals between r’ and x) add up to zero. This 
completes the second proof of (36). 








SERIAL CORRELATION 27 


5. Application to serial correlation. We shall now derive the characteristic 
values x; in the case that 


(61) Q = M = 2h. + Mets + +++ + Arutr. 


It will be of interest to compare this case with the slightly modified case of 
the quadratic form 


(62) M = LHe + M3 +++ + eritr + AN, 


which contains an additional term x72; accomplishing a circular arrangement 
of the variables. This modification was originally suggested by Hotelling in 
order to simplify the characteristic polynomial. Other simplifications arising 
out of the circular arrangement will appear below. It is possible, of course, 
that the power of the test of significance of serial correlation is slightly affected 
by the substitution of m for m, but this presumption needs corroboration by 
a study of power functions. 

The characteristic values of m are those values of «x for which the determinant 
of order T 


—x 3 0 0 
$ —K 3 0 
(63) Ar=|0 4 -—k 0 |=0. 
0 0 0 —k 
By development according to elements of the first row we find that 
(64) Ar = —KxAdr1 — 3Ar2, 
from which it follows that 
(65) Ar = citi + cots, 
if £; and é are the roots of 
(66) P+ ne +4 =0, 
satisfying 
(67) fiith=—x, ite = }. 


By inserting the known values of A; and A: in (65), the values of c; and cz are 
easily found to be such that 
ef — 
fi — & 
Although as a polynomial in « this is a rather complicated expression, the im- 


plicit form (68) will suffice for finding the roots of (63). Expressing all other 
variables in terms of one new variable w, 


(68) Ar = 








28 TJALLING KOOPMANS 


(69) & oe 


we find for (68), 
T+ —T—1 T T+) 
(70) Ap = (—3)’ ——_*— = (- i) Cnt. 


w—-w! 2w w— 1 
The only values of w for which this expression vanishes are the roots of 
(71) ern as 1, 
excepting those that are also roots of 
(72) wo = 1. 
This leaves us with 
(73) o= grttern 


The corresponding characteristic values are 


t= k= }#wtw, 


w — 
2’ DQ’ 


(74) kK; = COS ra , 
because the same value of x; is obtained whether the positive or the negative 
sign is taken in (73). These are 7 different values x;, and hence each one is a 
single root of (63). 

The characteristic values of m can now be derived from (68), although a 
simple straightforward method based on the properties of circulants is also 
available (see [6], p. 13). Writing 


= Ar + 2(—1)""(4) "— 442, 


we find easily from (70) that 


T+1 —F+1 


o . te es oil NO ae al 
Ar = (—}3) cree ae Be ene 
o—- W® o—- Ww 


(76) 


= (—43)"(w”7 + w 7 — 2) = — (—4)” “(cos Ta — 1), 


if 
(77) w =e, 


A complete set of the values w; for which A,r vanishes is found from 


(78) an t{=1,2---T, 
T 
and the corresponding characteristic values‘ %, are, according to (69), 


‘In order to simplify the formulae, the numbering of characteristic values according 
to decreasing size has been abandoned in (79). 








ng 





SERIAL CORRELATION 29 


(79) Rs = €08 a = cos t=1,2---T. 
In contradistinction to the case without circular arrangement, the characteristic 
values with indices ¢ and 7 — ¢ now coincide, such that all characteristic values 
are double except one (kz = 1) if T is odd, and except two (kr = 1, Kir = —1) 
if T is even. 

Taking advantage of the duplicity of almost all characteristic values, Ander- 
son [6] has derived expressions equivalent to (36) for this case, using methods 
that depend on this particular condition. On the basis of these results he has 


computed 99- and 95-percentiles in the distribution of 7 = ™ for the values 


T = 2, 3, 4, 5, 6, 7, 9, 11, 18, 15, 25, 45, interpolating the percentiles for inter- 
mediate values of 7. The 95-percentile for 7 = 45 is 0.240, as compared with 
0.261 for the normal distribution that provides an asymptotic approxima- 
tion. 

Whereas on this showing the normal approximation is slow in becoming ac- 
curate with increasing 7, a method for obtaining a much closer approximation 
is available, which works out simplest with respect to 7, but can also be applied 
to r. The principle of this method is applicable whenever the characteristic 
values follow a definite mathematical pattern. 

The method consists in replacing the finite number of discrete values x; in 
(36) by a continuous variable A, distributed according to a density function 
suggested by, and as closely as possible approximating to, the scatter of the 
values k,;. According to (79) the values x, are ordinates of the cosine function 
at equidistant points spaced out so as to cover one complete period 2 of that 
function. It is natural to approximate this scatter by the density function 


is seal T 
(80) x(A) = x—)? 
Qa 


of the cosine \ = cos + of an expression in which the variable 7 has a rectangular 


distribution between 0 and 7. The numerical factor in (80) is such that 
1 

(81) [mar 
1 


equals the total number of characteristic values to be replaced by a density 
function. The idea underlying the substitution of x(A) for the x, is to obtain 
what intuitively seems to be in some sense the closest approximation to the 
exact distribution function h(7) that has continuous derivatives of any order 
in any point except the two points (fF = —1 and # = +1) that limit its 
range. 

The factor in the integrand in (36) which involves the x; is approximated as 
follows: 


T 1f 1 
— cyt am oF, 8 eee) ca > log (x — d) | 
(82) Ii (kx—k%)* =e tH exp| i. WTP — I dy |. 








30 TJALLING KOOPMANS 


In order to evaluate the integral 

‘log (x — A) 
1 (1 — d*)3 
we shall first prove that its real part is independent of x, or that 


(83) J= dy 


log (x — A) — log (—A) , _ 

(84) R _ i<.— dy = 0, 

if R denotes ‘‘the real part of’. The integrand in (84) has singularities at the 
points \ = —1, 0,«, 1. These are of two types. The singularities \ = +1 
are introduced by the denominator and make the integrand change its sign if 
the argument \ is turned once around either singularity. If starting from a 
point on the real axis we turn the argument \ once around either of the other 
singularities, \ = 0 and A = x, introduced by the numerator, then the real 
part of the integrand is not affected, while 277 or — 277 is added to the imaginary 
part of the numerator, depending on the sense (clockwise or anti-clockwise) of 
the rotation and on the sign of the logarithm in (84) responsible for the singular- 


Fiacure 4. The integration path 8 


ity. It follows that one revolution along a closed curve 8 containing all four 
of the singularities, as indicated in Figure 4, carries us back to the same branch 
of the integrand, after mutually offsetting additions to the imaginary part of 
the numerator and after two changes in sign. This is in accordance with the 
regular character of the integrand at the point \ = o. 

It follows furthermore that the left hand member of (84) can be replaced by 

log (x — A) — log (—A) 

For, if the curve @ is constricted to a path 6’ running along the real axis from — 1 
to +1 and back to —1, the contributions of the two halves of the path will be 
equal to each other, also with respect to sign. This is also true for the parts 
of the path 6’ between 0 and x, because the behavior of the real part 


log |x — A| — log |A| 


of the numerator in passing either of the points 0 and x is independent of the 
side along which the singularity is passed’. 


5 For the same reason it is not necessary to specify in (84) on what sides these singularities 
are passed, although this is necessary with respect to « in (83) where the imaginary part has 
not been eliminated. 








i ee i, 


ir 


yf 
e 


f Oo = 


1e€ 





SERIAL CORRELATION 31 


Finally, if 8 in (85) is replaced by a large circle |X | = R, the validity of (84) 
follows from the fact that (85) tends to zero if R tends to infinity because the 
integrand is of the order of magnitude of \~” 

The real part of the integral in (83) ovtendinaty i is 


* log |A|  _ 5 f' _loga 
- - I, a—-m?"3h aw 


or, by the transformation \ = sin z, 


z 3 z 
sy = 2 | log sin x dz = 2 [ logcos x dx = [ log (sin x cos x) dx 
0 0 


(87) : 

= [ log (3 sin 2x) dx = 5 oe 3 + 3RJ, 
so that 
(88) RJ = —- log 2. 


In order to evaluate the imaginary part $J of (83), it is necessary to specify 
on which side the singularity « is passed by the integration variable \. In fact, 
both cases need to be considered; the passage of \ ‘‘over’’ « for values of « on 
the first part of the path of integration y; of « in (36), where x goes along the 
real axis from r to 1; and the passage of \ ‘‘under” « for values of x on the second 
part of its path y; , from 1 back tor. If the upper sign in the following formulae 
relates to the first of these two cases, we have 





1 
(89) WW=F ri | Gy = F riarc cos x, 
and, from (88) and (89), we find for the last member in (82) 
(90) e ttsl* nm git ptiTiare cos « 
Writing 
(91) arc cosk =a, x = cosa, ef" — ¢ *%* = Qi sin 37a, 


we find the following approximation for h(7) by inserting (90) in (36) as indicated 
in (82): 


a 4T parc cost 
(92) KA~ =e I ine ~ 0F°* da Pudine de 
0 


Calculations of the distribution function and of its percentiles will be much 
simpler for this approximation than for the exact function. 

In the case of r = m/p in which no circular arrangement is made a slight 
complication arises. The characteristic values x; given in (74) are again ordi- 
nates of the cosine function at equidistant points, but they do not cover a com- 
plete period or half-period of this function. Probably the most accurate pro- 





32 TJALLING KOOPMANS 


cedure would be to replace the limits of integration in (83) by cos [(34)/(7' + 1)] 
and cos [(7' + 3)x/(T + 1)], so as to have each discrete integral value of ¢ in 
(74) contribute an interval (t — 34,¢+ 4) of unit length to the range of 
the rectangularly distributed variable 7+ now defining the distribution of 
» = cos [r7/(T + 1)]; while making such an adjustment in the numerical 
factor in (80) that. the equivalent of (81) with the new limits of integration is 
satisfied. However, the evaluation of (83) and the simplicity of the result essen- 
tially rest on the fact that the limits of integration coincide with singularities 
of the integrand. In these circumstances a rather simple result can again be 
obtained by introducing two further changes which very nearly compensate 
each other. The first change is the arbitrary extension of the limits of integra- 
tion to what they are in (83), while increasing the numerical factor in (80) in 
such a manner that the integral in (81) will be T + 1 instead of J. This leaves 
the described contributions of the discrete values of ¢ in (74) to the range of + 
unaffected, but adds to that range the two intervals (0, 3) and (7 + 3, T) of 
half a unit length not representing anything that was already present. This 
can be largely offset by introducing two additional discrete values ¢ = 0 and 
t = T + 1, each with the negative weight — 3, if the weight of all other discrete 
values is considered to be +1. Instead of (82) we then have 


7 
4 2 log (x—«;) 
e ¢=1 


(93) 1 
T+1/f log (x — dA) 
~exp| - 74) 1966) ant Flog (« — 1) + tog (« +1) | 


If this expression is inserted in (36) with 7, constricted to y, , the argument of 
(94) e® log (x1) __ (x le 1)" 


is —2i/4 when «x goes from r to 1, and 77/4 when « returns from 1 tor. On 
account of 


(1 — «°)'* = sin! a, 
el(TtDie—rilt _ o-Mrt+ietrilt _ 95 gin R(T + lla — «/4], 
the result now is 
(47 a 1)Q*7*+ 


T 


(95) 


h(r) ~ 
(96) arc cos rT 
‘| (cos a — r)*?* sin [3(7 + lla — 2/4] sin®” @ da. 


It is not necessary to prove by direct integration that the conditions equivalent 
to (37) are satisfied by the approximate expressions (92) and (96). This follows 
from the fact that the difference of 2 between the degrees in x of the numerator 
and the denominator.in (36) is preserved by the substitutions (82) and (93); 








wT Te SRP I.e6e 


it 
vs 
or 








SERIAL CORRELATION _ 33 


that the numerical value of the limit for x > © of «’ times the integrand in (36) 
is not changed; and that no singularities outside the segment —1 S «x S 1 of 
the real axis are introduced. 

There is, of course, a certain degree of distortion involved in replacing the 
exact distribution functions by the smooth approximations derived. Such dis- 
tortion is most serious in so far as it occurs at the tails of the distribution, where 
the usual significance limits are located. For instance, the exact distribution of 
7 is asymmetric if T is odd, and ranges from cos [(7’ — 1)x/T] to +1, whereas 
the smooth approximation is symmetric and ranges from —1 to +1. In the 
case of r both the exact distribution and the approximation are symmetric, but 
the former ranges from cos [Txz/(7T' + 1)] to cos [x/(7T + 1)], the latter from 
—lto+1. However, this difference is to some extent compensated by a curious 
anomaly in the function (96). This function actually dips below zero on sym- 
metrically placed sma]l intervals adjoining —1 and +1, the length of which is 
of the order of the difference 1 — cos [x/(T + 1)] between unity and the highest 
characteristic value. Percentiles must therefore be counted on both sides from 
two points absolutely smaller than unity, defined by requiring that the small 
parts of the area “‘under’ the curve (95) outside these points are algebraically 
zero each. 

These distortions have importance only for small values of 7. Anderson 
finds ({6] p. 52) that the exact function h(7) is symmetrical within three-decimal 
accuracy for all values of T = 11 (the modal value h(0) for T = 11 is about 1.27). 
There are in the case of 7 three characteristic values xk, exceeding the 95-percen- 
tile as given by Anderson for 7 = 7;5 for T = 13;11 for T = 25. Correspond- 
ing numbers for the 99-percentile are 3 for T = 13;9 for T = 25; 17 for7’ =45. 
These numbers suggest that the approximations (92) and (96) will provide good 
significance limits long before the normal approximation is acceptable. Accurate 
calculations will be needed to find out from what talue of 7 onward the ap- 
proximations can safely be substituted for the exact distributions. 


REFERENCES 


[1] J. TinBERGEN, Business cycles in the United States of America 1919-1932, League of 

Nations, Geneva, 1939. 
M. M. Foon, “‘Recursive methods in business-cycle analysis,’’ Econometrica, Vol. 8 

(1940) p. 333. 

[2] H. Wotp, A study in the Analysis of Stationary Time Series, Uppsala, 1938. 

[3] S. S. Wivks, Statistical Inference, Ann Arbor, 1937. 

[4] J. von NEUMANN, “‘Distribution of the ratio of the mean square successive difference 
to the variance,’ Annals of Math. Stat., Vol. 12(1941), pp. 367-395. 

[5] J. von Neumann, ‘‘A further remark concerning the distribution of the ratio of the 
mean square successive difference to the variance,’’ Annals of Math. Stat., Vol. 
13(1942), pp. 86-88. 

[6] R. L. ANpERsoN, Serial Correlation in the Analysis of Time Series, unpublished thesis, 
Iowa State College, 1941. 





A GENERALIZED ANALYSIS OF VARIANCE 
By FRANKLIN E. SATTERTHWAITE 


University of Iowa and Aetna Life Insurance Company 


The analysis of variance is a statistical technique whose fields of application 
are only beginning to be explored. A few simple standard designs appear in the 
literature and a great deal has been done with them. However, if the applied 
statistician limits himself to such standard designs, he soon finds that many of 
his problems are receiving inadequate or inappropriate treatment. The writer 
has found this particularly true in his own field where most of the raw data are 
in the nature of frequencies or averages which lack homogeneity of variance. 
Also the nature of the problem usually indicates the use of weighted averages 
rather than simple averages and sometimes part of the data are missing. 

The purpose of this study is to examine the fundamental principles under- 
lying analysis of variance designs and to show how designs may be constructed 
and applied to practically any data which can be assumed to be normally 
distributed. 


1. Test of independence. In the analysis of variance we calculate two or 
more statistics of the types, 


x” = 2(z; - m)’; 


x = 26. 
The z,’s are considered to be independent variables from a normal population. 
The m,’s and the 6,’s are homogeneous linear functions of the z;’s. Heretofore 
the demonstration of the independence of the x”’s used has only been made for 
certain special 6;’s and m,’s. To make our analysis general we shall let our 6;’s 
be general homogeneous linear functions of the x,’s and we shall define our m,’s 
through certain linear homogeneous restrictions. 
Let us define Chi-square as 


x” = U(x; — mi) 


where the z,’s are independent normally distributed variables with mean zero 
and unit variance. We also define certain linear functions of the ;,’s,’ 


(1) 6; = a;i%;, j= 1, 2, °°° 8, 


which we shall assume to have been orthogonalized.” To define the m,’s we 
make use of the linear restrictions 


1 A repeated lower case subscript will always indicate summation with respect to that 
subscript. All subscripts range from 1 to n unless otherwise specified. The Kronecker 
Delta, 5;;, equals one or zero depending on whether 7 equals or does not equal j. 

2 The 0;’s are orthogonal if aj.aj, = 5;;. Any algebraically independent set may be 


34 









ro 


at 
cer 


be 


ANALYSIS OF VARIANCE 












a;(x; — m;) = 0, 
am, = ar; = 8;. 


This system has an (n — s)-infinitude of solutions and we should not expect 
all of these to be suitable for our purposes. For reasons which will appear later 
we shall choose the single solution, 


(3) my = a8; = AjrAjiki , j = a ceo § 


This is the solution which follows if we complete the system (2) with n — s 
additional linear restrictions on the m,’s which are homogeneous and which form 
an orthogonal set with (2). Thus 


a;m; = 6;, j=1,---8, 


a;m,; = 0, jg=seti,---n. 







This is consistent with standard analysis of .variance designs. For the usual 
one way analysis, we have 


— | 1 , 
(4) Do nm = Lo aati j= i,---6, 


t=l 








which yield a solution according to (3), 









1 1 . 
Mi = Tan Uti, jg=l,-:- 8. 













The additional homogeneous restrictions in this case might have been taken as 


Ma = Mp = ++: = Mj, j=1,---8, 





which are orthogonal to (4) and may be easily orthogonalized among themselves. 
Substituting the values of the m,’s obtained in (3) into Chi-square, we obtain, 


x” = (a; — m,)(x; — mi) 





= (bie — Gj: jx)2x(5i2 — AmiDmi)Zi , j,m=1,°--8, 
= (542 — AmkAmt — BjiBje + 5 jmAjxAmi) TeX 


= (42 — GjxOjr) 2. 





replaced by an equivalent orthogonal set. Thus, if 62 is not orthogonal to 6, it may be 
replaced by 6} = 0. + k@:, where k is determined by 


Q1; (G2; + kaa;) = 0 


or k = = 01j02;/01;01;. 
The condition 2a}; = az; = 1 can always be met by simple division. 


36 FRANKLIN E. SATTERTHWAITE 


The sum of the squares of the @,’s is 
26; = 0,0;, 
= AjrAjiTel1 . 
Therefore we have the relation, 


(5) x + 26; = dete, = Taj. 


The rank, R; , of each 6; is obviously equal to unity since it is the square of a 
linear form. The rank, Ro, of x’ is at least equal to n — s since the rank of the 
right hand side of (5) isn. Also, Ry can not be greater than n — s since, 


Qjx(Ser — Andy) = Aj — 5;0in2, i,j=i,°:8, 
= 0 
gives s independent relations between the rows of its coefficient matrix. There- 
fore we have the relation, 
(6) R+h+-::-+R =n. 
The two conditions, (5) and (6), are sufficient’ conditions for x’ and the 6;’s 


each to be independent of the others and each to be distributed as is Chi-square 
with the number of degrees of freedom equal to its rank. 


2. Adjustment of data. The above development is not general enough for 
many practical problems. We do not always have given data, y;, which are 
normally distributed about a mean zero with unit (or homogeneous) variance. 
Of course if the means, #;, and variances, ; , are known, we may make the 
transformation, 


Xs 
o% 

and apply our theory in a straight forward manner. We shall now check the 
effect on our analysis if the ,’s and o,’s are determined, in part at least, from 
our data, the y,’s. 

Let us assume that the z,’s of (7) are normally and independently distributed 
variables about a mean zero and with unit variance. Let us also define certain 
linear orthogonal functions of the first r of the 6,’s by 


be = bej0; = bij auas k = 1,2, +--4q, 
(8) eeacall 
= bij Oi (¥ *) j _ A, 2, eet 
0% 


We next form the characteristic function of the joint distribution of x’, of 
216; , of 6741, --- 6, and of gi, ---¢_. This is 


*See A. T. Craig, ‘‘On the independence of certain estimates of variance,’’ Annals of 
Math. Stat., Vol. 9(1938), pp. 46-55. 








r 


le 





ANALYSIS OF VARIANCE 37 

H(t, U, Vr41, °° * » Vey Wi, °°° We) 
= K / ee / exp [itx? + iud{6} + 123410565 
+ 1Z{ W; o; ad ADT 25] dx, -++ day. 

The conditions (5) and (6) are sufficient‘ for there to exist an orthogonal trans- 
formation of the z,’s which will convert 

6; to 96;, j=l,---8, 

x to Bi416;, 

Zia; to =16;, 

dV = II dz; to II d6;. 


The characteristic function then takes the form, 


& = K {nu [ exp | -10 — 2iu) ¢ - nbn Yl a} 


{If exp [w;/2(1 — 2iu)]} 
{Ita / exp [—3(1 — 2iv)6j] a} 


{te | exp [—4(1 — 2it)65] io}, 
where 
=(wibs;)” = Dw; , 


since the b;,;’s are orthogonal. 

At the beginning of this section we stated that we wished in some way to use 
our data, the y,’s, to estimate the m,’s and the o,’s. A suitable method is to 
restrict the @ functions, (8), to zero. 

Our problem thus reduces to finding the distribution of the “array” in our 
joint distribution for which 


gd; = do = eee = ¢, = 0. 
Except for perhaps a constant factor, the characteristic function of the distri- 


bution of such an array is obtained from & by integrating out the w,’s.’ Thus, 
on performing the integrations, we have, 


*See A. T. Craig, ibid. 

5 This is easily seen since if one passes from the characteristic function to the joint 
distribution, equates the ¢;’s to zero, and then passes back to the characteristic function, 
all the integrations except the above appear in pairs of the form 


s ee [ cee dt dz, 
2r 


which leave unchanged. 





38 FRANKLIN E. SATTERTHWAITE 


#'(t, Uy Vrgty °°" 4 Us) - K{(1 a ii {141 (1 ee 2iv;)*"} 
{(1 aa oem, 
= &(u) {Tle 41 $;(v;) }S(0), 


which shows that 2{ 6] , 6741, --- 6%, and x’ are each independent of the others 
and that each is distributed according to the Chi-square distribution with 
r — q,1,--- 1, and n — s degrees of freedom respectively. 


3. Numerical application. The developments of the preceding sections have 
been abbreviated to cover technical points alone. We shall now take a definite 
practical problem and see how we may work out its solution with the aid of the 
above techniques. 

In Table I are given the losses, the exposures (in car years) and the indicated 
pure premiums from the Massachusetts Statutory Liability automobile insur- 
ance experience for four towns and for three different classes of cars. (To 
illustrate the effect of missing items, the data for town D, class W, and for 
town C, class Y, have been omitted.) Our problem is to determine if there is a 
significant variation in the indicated pure premium between the different towns 
and between the different classes of cars. 

Our first problem is to set up a normally distributed variable about a mean 
zero and with homogeneous variance. The true mean, m;, of the distribution 
of the indicated pure premiums, P;, is unknown. Under the hypothesis that 
the different towns and classes of cars are homogeneous with each other, we 
may assume that the m,’s are all equal. We may estimate their value by using 
the combined indicated P for the whole territory, which is $32.44. By a pre- 
liminary argument, which need not concern us here, we show that the variance, 
o; , of an indicated pure premium is inversely proportional to the exposure, £; , 
on which it is based but the constant of proportionality is unknown. If we 
now make the assumption that the indicated pure premiums are normally 
distributed, we may convert them to the form 


P; — 32.44 


_ 1/E;”” 


which will be normally distributed about a mean zero with homogeneous vari- 
ance. We have calculated these statistics and entered them in the table. 
Because the expected value of P; , $32.44, was estimated from our data, the z,’s 
are subject to the single homogeneous linear restriction, 


0 = X(L; — PE) 


- 1/2 P;-P 
(9) 2B ae 


= LEY? x; ‘ 





(QT1‘Z)|(68I ZF) |(HHO SE) 929 ‘ZZ |900°6F |SI9‘9T |89S‘9% | E9z‘Zb | ZZ0'9 |(TSO0'EZT) HD Z/,(*2'D Z) = 10 tt = fg 
Iss} 9912) 228110 62— |8'SI— | SIS | 8 22r| S8'2el| 6'OLF "0g 
9r8°— | 80L° | 2OS° | OFF | FOF "og/ong = 'y 
0} 26F a b9r Hog/! ong = My 

O10‘Z | 0629 | O1F‘T | OLT‘S | OT8‘9 | OFE‘OT 


oo 
oO 
Z 
=< 
— 
% 
< 
> 
a 
° 
mn 
— 
m2 
~ 
+) 
<a 
z 
< 


SBSSx 


008 ‘T&T 


006 ‘F0F 

00S ‘ZT 

006‘ZOI | 140‘E 
61° ZE |00S ‘$ST | 008‘F 


< MOR | <MOAH 
ae 


38328 
= 





nn | tam | worn 


¢ 


s % 
‘d !7 sassoy : uMO, SSP] 
(f) ‘suonotsoy umrureig| *7 aunsodxy L a 


I WTavii 








40 FRANKLIN E. SATTERTHWAITE 





The next step is to express the indicated pure premiums for each town and 
for each class of car as 6,’s as defined in equation (1). For town A we have an 
indicated pure premium of $33.21 when all classes of cars are combined. This 
breaks down as follows: 


33.21 





LE;P;/ZE;, i = 1, 4,8, 
[DE .(ai/Ey” + 32.44)]/ZE; 
= >(E}?/ZE,)x; + 32.44. 


Dividing this by the square root of the sum of the squares of the coefficients, 
we obtain, 


6 = (ZE,)""(33.21 — 32.44), i=1,4,8, 


2(E;"/(2Ei)'")z: , 


which is of the form of (1). We have entered the coefficients of 6, (except for 
the common denominator, (2E,)"”, whose square is entered on line (1’)) under 
Restriction (1) in the table. Similarly, we have entered the values for the other 
towns under Restrictions (2), (3), and (4). The values for the classes of cars 
are entered under (5), (6), and (7). 

The next step is to orthogonalize the 6,’s. The first four have no common 
elements so they are orthogonal by inspection. To make 6; orthogonal to 6, 
we must add to 4, 


(10) 









ka = —Zagan/ Lai; 


times 6,. This and similar coefficients for making 4; orthogonal to 42 , 03 , and 4% 


are entered on line (2’). We may now replace 6; by the equivalent 6, by the 
formula 


(11) Qisr = Gis + keds + Ksedicg + Ksgdig + Kkerdis . 


Similar k’s for 6. are entered on line (3’) and @ is replaced by 6. 67 should be 
ignored since it is algebraically dependent on the other 6,’s: 


6, = 0, + 62 + O03 + % — 0 — &. 
Note that on line (4’) we have entered 2a;; for checking the calculation (11). 

We next calculate the 6;’s according to the formula, 

6; = [Zaia,]'/2ai;. 

Note that for this particular design all the 6;’s except 65, and 6 are numerically 
equal to the corresponding 2;’s (enclosed in parentheses). 

Returning to equation (9), we see that it is equivalent to either of the following 
restrictions on the 6;’s: 


Ew 9, + Ey? 0, + Ev, + EL? = 0 






























ANALYSIS OF VARIANCE 41 


or 


026, + En? 6, + Ex? 6; = 0. 


Therefore we may conclude that 
Si/oz = (61 + 6: + 63 + 6%)/02 = 96,469/0% 


is distributed as is Chi-square with three degrees of freedom. Also we may 
conclude that 


S3/o2 = (65 + 0 + 0:)/o2 = 79,349/o2, 


is distributed as is Chi-square with two degrees of freedom. Note that we have 
not proved, and indeed it is not so, that Sj and S} are independent. 
We have yet to obtain our interaction sum of squares. Equation (5) is of 
assistance, here giving, 
S3/oz = [Zxi — (65 + 65 + 63 + 0% + OF + 68-)]/oz 
_ 395,360 — 173,051 _ 222,309 
iii: ea ak 
Cz Cz 


This is distributed as is Chi-square with 10 — 6 = 4 degrees of freedom. Also 
it is independent of Sj and of S3. 
Lastly we form the variance ratios 


_ 96.469/3 


Fy — 222,309/4 — 0.58, 
_ 79,349/2 _ 
F, _ 222,309/4 — 0.71, 


which are not significant. 

We therefore conclude that as far as the present data and analysis show, we 
have no reason to believe that these three classes of cars and these four towns 
are not.all subject to the same true premium rate. 








DISTRIBUTIONS IN STRATIFIED SAMPLING 


By Paut H. ANDERSON 


University of Illinois 


1. Introduction. In this paper, distributions of means and standard devia- 
tions will be derived for random and stratified samples. It is not necessary to 
define random sampling here, for one may find it defined in any elementary 
text. If before drawing the sample from a population 7, it is divided into 
several strata m , 72, °°: ,7,, and the sample = is composed of s partial samples 
21, 22, °°: , 2, each drawn with or without replacement from the strata; and 
if the sizes m; of the partial samples are proportionate to the sizes M; or cor- 
responding strata, i.e., m; = kM;, then the sample which is obtained in this 
manner is a stratified sample. When the sizes of the partial samples are not 
proportionate to the sizes of the corresponding strata, the distributions of means 
and standard deviations will differ from the distributions obtained when the 
sizes of the partial samples are proportionate to the sizes of the corresponding 
strata. This will be shown in the sections that follow. 

The distributions of means and standard deviations from well-known popula- 
tions for stratified and random samples will be derived and compared, as to 
scatter and symmetry. It should be remembered even though stratification 
has little to recommend its use, in some cases, over random sampling, the im- 
possibility of obtaining random samples makes its use necessary. Since most 
of the problems with which the practical statistician is confronted are of the 
kind which make random sampling difficult or even impossible, stratified 
sampling is being investigated by many research workers. 


2. The distribution of means and standard deviations for samples of two 
drawn from any population having a continuous frequency function. Let f(z) 
be a continuous frequency function whose mean is zero, and fora < z < b, 
let f(z) > 0, elsewhere let f(x) = 0. We select a sample of two elements (1 , 
a2) which can be represented by a point in a square of side b — a, as point P in 
Fig. 1. It is well known that the probability of getting a sample point in the 
element of area dx; dx2 is f(21)f(x2) da, dxz. The probability of getting a value 
of Z (mean) less than the value of the mean represented by a point on the line 
RT (Fig. 1) whose equation is x; + x2 = 23, is given by 


(1) [- dz; o dxf (x1) f(x2). 


The distribution of  < }(a + b) is 


(2) 2 Hay) fl2e = 21) dz, 


42 








VW OO @F che BG we Ye ON UU 


STRATIFIED SAMPLING 43 


which is obtained by differentiating (1) with respect to z. For all values 
= > 3(a + b), we must use another equation which we shall now derive similarly. 
The probability of obtaining a mean less than the mean of any point on R’7” 
(Fig. 1) is 


im I 7 dn I 7 dite (a1) f(s). 


Differentiating this expression, we obtain 


(3) 2 [sense —2) dz. 
BR x, | 
. A 
a xX ~ b x; 


R 


/ 


T 


Z\ 


7s Qa 


Fie. 1 


The distribution of means is given by (2) and (3). Let us apply the theorem 
to the rectangular population 


= 1, —-3$< S$ t = —-}, b=}, 
f(z) <2}, a 2 


= 0 elsewhere. 
Substituting in (2) and (3) respectively, the results obtained are 


= 2(1 + 22), for Z < 0, 
g(Z) : 5 
= 2(1 — 22), for z > 0. 
J. O. Irwin [1] and Philip Hall [2] obtained these results also but by different 
methods. However, the distribution of 2 was known to Laplace and other 
earlier writers. 
From Fig. 1, it is seen that the probability of obtaining a value of S (standard 
deviation), less than the value of S on AB whose equation is x, — 1 = 2Sis 




















PAUL H. ANDERSON 
b—28 b 
1-2[ dm f — dref(e) f(a). 
a 28+2, 
Upon differentiating this expression with respect to S, we obtain 
b—28 
(4) nS) =4[  f(x)f28 + 2) de. 


For the rectangular population h(S) = 4(1 — 2S). This result agrees with 
that found by P. R. Rider [8]. 





3. Sampling from a rectangular population. Let the rectangular population 
be f(x) = 1, for 0 < x < 1, elsewhere f(z) = 0. From this population we 


CX) 





Fic. 2 





select a stratified sample of two elements which is chosen so that 0 < x < 34 
and 4 < x, <1. The probability of obtaining a mean less than the mean of 
any point on the line R’T” (Fig. 2) whose equation is 1; + 22 = 22, is 


22—4 22—2z 
a[ dr [dey = 4(2" — = + 9). 
0 0 ; 
Similarly, the probability of obtaining a mean less than the mean of any point 
on R’T’ (Fig. 2) whose equation is 1; + x2 = 2%, is 


4 1 
1-4 az, [ dz. = 12% — 8%" — . 
22—1 2%—z) 


Differentiating the right-hand side of the above two equations with respect to 
£, we get the distribution of means of stratified samples of two elements from a 
rectangular distribution function to be 


STRATIFIED SAMPLING 45 


_ = 16% — 4, for} < Zé < 3, 
(5) g(Z) _ 
= 4 — 47, for} < @ < 3. 
The distribution of means for random samples of two elements from the same 
rectangular population is 


= 4g, for0 <a <} 
(6) g(Z) . 


=4-4%, for}<i< 


Upon examining (5) and (6) we see that: 
A. The stratified sample means are more stable than the random means. 
B. The random sample means and the stratified sample means are both 
distributed symmetrically. 
C. The range of the random means is twice the range of the stratified 
means. 

It remains now to find the distributions of the standard deviations for samples 
of two elements where one element is selected from each half of the population. 
All points on AB (Fig. 2) have the same standard deviation. Furthermore the 
equation of AB is x. — x, = 2S. The probability of obtaining a standard 
deviation less than the standard deviation of any point on AB (Fig. 2) is 


4-28 4 
4 | dz, | dx = 8S”. 
4 28+z2, 


Furthermore, the probability of getting a standard deviation less than the 
standard deviation on the line A’B’ (Fig. 2) of which the equation is 
2-1 = 28, is 

0 28+z, 

1-4 dz, [ dz, = —1 — 8S’ + 8S. 

1—28 1 
Differentiation of the right-hand side of the above two equations with respect 
to S yields the distribution of standard deviations of stratified samples of two 
elements from a rectangular distribution function to be 


= 16S, for0 < S < j, 
(7) h(S) 

= 8 — 16S, for} < S < }. 
The distribution of the standard deviations for random samples of two elements is 
(8) h(S) = 4(1 — 28), for0 < S < 3. 


From (7) and (8) it is easily seen that: 
A. The range of the standard deviations for stratified and random samples 
is the same. 
B. The distribution of standard deviations for random samples of two 
elements is skewed, but the distribution of the standard deviations for 
stratified samples of two elements is symmetrical. 





46 PAUL H. ANDERSON 


If we take a random sample of two elements from the rectangular population on 
the interval —} < x < }, then Student’s ratio t = V2z/V (x, — 2° + (m2 — 2)" 
will have the distribution 


=1/2(it—1)? fort < 0, 
F(t) 
= 1/2t+ 1) fort>0. 


This result was obtained by Laderman [7] and others. According to the reason- 
ing used by Laderman, the probability of getting a value of t less than the value 
on OS is (for stratified samples of two elements) 


0 21 (t+1) /(t—1) 
af ax [ dry = —3(t + 1)/(t — 1). 
Ls 0 
When ¢ > 0, the probability of obtaining a value of ¢ greater than the value on 
OS is 
} 0 
4 [ dx / 
“U z2q(t—1) /(t+)) 


It follows easily that the probability of getting a value of ¢ less than the value 
represented on OS is for stratified samples equal to 


4 0 
1-4] dxf dy « 1+ 940 — DAC + YD. 
0 2% (t—1)/(t+1) 


Differentiating the right-hand side of the first and third above equations with 
respect to ¢, we find the distribution of Student’s ratio for stratified samples of 
two elements from a rectangular population to be 


= 1/(t — 1), for -1 < t¢ < 0, 
F(t) . 
= 1/(¢+ 1), oe €4¢¢ 1. 


Comparing the random sample and stratified sample distributions of t, we 
find that 
A. The stratified t’s are more stable than the random ?’s. 
B. Both distributions are symmetrical. 
C. The range for the stratified t’s is —1 < ¢ < + 1, while the range for 
the ?’s obtained from random samples is —«© < t < + o. 

By means of a different method, distributions of means of stratified samples 
will be obtained. Let (A) and (B) be rectangular populations f(x), f(y) re- 
spectively, with positive values on the interval 0, 1. From the rectangular 
population (A) select a stratified sample of two elements x; and 22 such that 
0 <2 < 4,34 <2.< 1. Then the probability of getting asample point in the 
element of area dz; dz2 is 4 dx, dxz. Now let y; = 22; (change of unit of measure- 
ment), ye = 2x22 — 1 (change of unit of measurement and translation). Then 
4 dx, dx, = dy, dy2. We have also thatO < y% < 1,0 < y < 1. With re- 








STRATIFIED SAMPLING 47 


spect to the distribution of the means, a stratified sample of two elements from 
(A) is the same as a random sample of two elements from (B). Now the means 
for random samples of two elements from (B) have the distribution g(7) which 
is really expression (6) with g substituted for <. Furthermore, we have 
G9 = yt yo) = 3(2x, + 222-1) = 2% — 4. Hence it follows readily that 
g(z) = 16% — 4 for} < z < 3, g(%) = 12 — 16%for} < F< 2d. 

From the rectangular population (A), take a stratified sample of three ele- 
ments 0 < 1 < 34,3 << 4,3 <23<1. Thesample points will all lie in a 
cube within the unit cube. Then the probability of getting a sample point in 
the element of volume dz; dx, dz3 is 27 dx, dx, dzz. Now let m1. = 321, ye = 
322 — 1, ys = 323 — 2. Therefore 0 < y; < 1, fori = 1, 2,3. Furthermore, 
dy; dy, dy; = 27 dz, dz.dz;. With respect to the distribution of the means, a 
stratified sample of three elements from (A) is the same as a random sample of 
three elements from (B). Now the means for random sample of three elements 
from (B) have the distribution 


279° /2, for0 < § < 3, 
g(9) = 49(69 — 67 — 1)/2, for} <9 < 3, 
27(1 — 9)"/2, for? <g <1. 
We havealsog = 34 -—1. Forg = 0,3, 3,1,z = 3, 4, $, %, respectively. Hence 
81(3% — 1)°/2, for} SZ <4, 
g(Z) = + 27(54% — 54%” — 13)/2, for¢ < # < 3, 
81(2 — 3z)"/2, for § < z < 3. 


Thus we have found the distribution of the means for stratified samples of three 
elements when one element is selected from each third of the population. 

From the rectangular population (A), take a stratified sample of four elements 
O<u<Si,i5uS54,35uS54,3 Sa <1. Again, a stratified sample of 
four elements from (A) (with respect to the distribution of means) is the same 
as a random sample of four elements from (B). The means for random samples 
of four elements from (B) have the distribution: 


(1289°/3, for0 < 9 < }, 
; 8[1 — 24(9 — 4)° — 48(9 — 3)°//8, fort < 9 < 3, 
9D = Ven — ong — 4 489 — V3, ford <9 <3, 
128(1 — )*/3, | for? <9 <1. 
Since 7 = 4% — 3, we have for 7 = 0, i, 3, 2, 1 respectively, = %, x, 4, r%, 


42. Hence 

(512(4z — 3)°/3, for} <i < x, 
32[1 — 24(4@ — 2)? — 48(4% — 2)°]/3, forys < z < }, 
32[1 — 24(4% — 2)’ + 48(4% — 2)°]/3, for} < %< x, 
a — 4% + 3)°/3, for 7% < = < 3. 


g(Z) = 











48 PAUL H. ANDERSON 





This is the distribution of the means for stratified samples of four elements (one 
element from each quartile). We can extend this to stratified samples of size 
n where one element is selected from each stratum and there are n strata. As 
nm increases, we note that 

A. The range of the means decreases. 

B. The scatter of the means decreases. 

C. The number of arcs in the distribution of the means increases. 

Take the stratified sample of four elements (two elements from each half), 
0<21<4,0<%<3,3 <2 <1, <a%,< 1. With respect to the distribu- 
tion of the means, a stratified sample of four elements from (A) is the same as 
a random sample of four elements from (B). Now the means for random 
samples of four elements from (B) have the distribution (C). Furthermore 



























G = 2% — 3, dg = 2d, and for g = 0, }, 3, 3, 1,2 = 3, 3, 3, $3. Thus 
(256(2% — 4)°/3, for} < % <3, 

; jot — 24(2% — 1)° — 48(2% — 1)]/3, for 3 < z < }, 

=) = | 61 — 24(2% — 1)? + 48(2% — 1)*]/3, for} < z < §, 
(256($ — 2z)°/3, for§ <i<i 


Hence the distribution of the means for stratified sample of four elements (two 
elements from each half) has been found. 

If we take a stratified sample of six elements (three elements from each half) 
we find that the graph of the distribution function of the means will consist of 
six arcs; the range will be } <  < 3. Thus we see that as we take more ele- 
ments from each half, the distribution becomes smoother. The number of 
arcs in the distribution of the means also increases. The range of the means 
remains the same but scatter decreases as we take more elements from each 
half of the population (A). 

The results so far obtained are true for the rectangular population which is 
symmetric. In order to make further comparisons in the distributions of means 
and standard deviations for stratified and random samples, let us now consider 
a skewed distribution. 





4. Sampling from a skewed population. Let us consider the population 
f(z) = 22,0 < « < 1, f(z) = Oelséwhere. If we take random samples of two 
elements from this population, the points represented by each sample will lie 
inside the unit square. For random samples of two elements from this popula- 
tion the distribution of means will consist of two cubies g(Z) = 322°/3, for 
0 < z < }, g(%) = 16(3z— 2% — 1)/3, for } < @ < 1. Furthermore, the 
distribution of the standard deviation for random sample of two elements is a 
cubic: h(S) = 16(4S* — 3S + 1)/3, for 0 < S < 4. Now we consider the 
distribution of means for stratified samples of two elements when one element 
is selected from the range (0 < 2, < 4) which comprises one fourth of the total 








STRATIFIED SAMPLING 49 


population. The other element is selected from the range (} < 22 < 1) which 
constitutes three quarters of the total population. By use of the geometric 
method the distribution of the stratified means is found to be 


a” 16(32z° — 6+ 1)/9, for} <#<}, 
g Zz 
= 16(30% — 9 — 322°)/9, for} <#< 


The range of the stratified means is less, and the distribution is more nearly 
symmetrical than it is for the random means as may be seen by comparing the 
graphs of the two distribution functions. Thus we see that stratification gives 
the means greater stability. The distribution of the standard deviations of the 
stratified samples of two elements is: 


= 64(3S — 85S°)/9, for 0 
h(S) ( )/ 


L 
4 
1 
2 


<8 
= 128(48° -—3S8+1)/9, for} <S 


Upon comparing the distributions of the standard deviations for random and 
stratified samples, we observe that the random case yields a single cubic whereas 
the stratified case yields two cubics. The distribution obtained for the stratified 
case is more symmetrical than it is for the random case as may be seen by 
sketching the graphs of the two distribution functions. The range for both 
distributions is the same. 


5. Sampling from a normal population. We shall consider a normal popula- 
tion F having the frequency function e 4? /4/In, (—2© <x < «); and the ith 
moment about the mean will be u;. Divide this popes into two equal 
parts F; and F: such that the frequency function of F; is Qe" /./2n, (—e < 
x < 0), and the frequency function of F?2 is Qe **/4/2n, (0< 4 < ~). The 
ith moment of F; about the origin will be m, , while the 7th moment about its 
mean will be ua ; the corresponding 7th moments for F2 will be mi. and pi2 re- 
spectively. In what follows M; will be the ith moment about the origin of the 
distribution sought, while M; will be the 7th moment about the mean. Further- 
more, the constants 6; , B2 , x, S, (measure of skewness) which will be used here 
are defined in Elderton [8]. Finally, E[f(x)] will be the expected value of f(z). 

If we take a random sample of n elements 2, 22, --*, 2, from F; and a 
random sample of n elements 2n41, --* , en from F2 , the 2n elements 7, --- , 
In, Inti, ‘**, Lon Will be a stratified sample from the population F. Let 
%, = (1/n) Dn, h = (1/n) Yas, and = 3(%, + 42); then z will be the 


mean of he stratified sample. By using Tchouproff’s [6] formulae and expected 
values, we obtain the following values: 


= E(z) = 3(mn + my) = 0, 
Mz = E(#’) = (un + un)/4n = (1 — miz)/2n, 





PAUL H. ANDERSON 


Mz = E(z*) = (un + use)/8n" = 0, 
M, = E(z*) = [wa + we + 3n(uar + we)? — 3(ud + u22)]/16n*, 
Bi = M3/M2=0, Be = M./M? =3 + 4(x — 3)/n(x — 2)’. 


From these constants, we see that the variance of the stratified means is 
(1 — 2/)/2n, but the variance of random means of 2n elements is 1/2n as is 
well-known. Thus it is obvious that the scatter of the stratified means is less 
than the seatter of the random means. Furthermore, the stratified means are 
distributed symmetrically since M; = 0. Observing 62, we notice that the 
distribution of the stratified means is slightly more peaked than normal. Since 
it is well known that random means from a normal population are normally 
distributed, the differences between the two distributions are easy to see. As 
n — ©, B. — 3, so it is reasonably likely that the stratified means tend to be 
normally distributed as the size of the sample increases. 

If we select a random sample (z, y) of two elements from the normal popu- 
lation F, then the variance (S’) will be: 


S = 3(2’ + y’) — («@ + y)'/4 = (2 — y)"/4. 
The method of expectations gives us the following values: 


M2 = 3, M; = 1, M,= 2, 


Bi = 8, Be = 15, Ss, = —V BiG: + 3) = /2. 


2(5B2 — 6B: — 9) 


Therefore, the skewness of this distribution as measured by Elderton’s formula 
is 1.414. For a stratified sample, where we select x from F; and y from F;, 
the second, third, and fourth central moments of S’ are: 


Mz = (x + Qe + 2)/2r’, 
M; = (40° + 7x” — 129 + 8)/4n'’, 
My, = (15x* + 300° — 40x” + 244 — 12)/4x". 


It follows easily that 6: = 4.71084, B. = 10.28489, x = 19.4, 28. — 36: — 6 = 
.438324, S, = 1.02. For samples of two elements, the stratified samples yield 
a distribution for the variance which is less skewed than the corresponding 
distribution of the variances for random samples. The variances for random 
samples of two elements are distributed as a Type III curve, while the variance 
for stratified samples of two elements is either a Type III or a Type VI curve. 
The difference between the random case and the stratified case as seen from 
this point of view is not clear cut. 

It is interesting to see what sort of bias is introduced by taking n elements of 
the sample from F; and by taking 2n elements of the sample from F,. Under 
these circumstances, the complete sample will contain 3n elements, and the mean 





a OE ES SS li“ 








STRATIFIED SAMPLING 51 


of the sample will be = : x;/38n = (% + 2%)/3. As before, the central 
moments and the #’s are found to be: 

Mz = (un + 2p2)/9n = p2/3n, Mz = (us + 2ys2)/ 27n* = bs2/ Qn’, 

Mg = [we — 3uo2 + Ynpse]/27n', 

Bi = p32/3npie , Bo = p42/3npxe — 1/n + 3. 


We notice first that the means are not symmetrically distributed for small values 
of n since 6; * 0, but as n — ©, 6, — 0, so the means tend to be symmetrically 
distributed. It is evident also that 6. — 3 with increasing n; consequently, 
the bias which is present for small values of n tends to disappear as n increases. 
Incorrect proportioning of the sizes of the partial samples in stratified sampling 
introduces an error into the results whose magnitude decreases with an in- 
crease in n. 


6. Sampling from a population y = ¢(z). Suppose we have a well-behaved 
frequency function ¢(x) of which the first four moments are finite. Further- 
more, it will be required that ¢(x) be continuous and Riemann-integrable. 
Divide the total z-axis into K parts I, , I2,--- , I; with the separating points 


ay) ad 
a , 2, *** , &-1 in such manner that | o(z)dr=--- = / (x) dx = 1/K. 
00 ak—1 


In this section, we extend some of the definitions of the last section; yu, will be 

the ith moment about the mean of the th part I, , and m,, will be the 7th moment 

about the origin of the tth part J,. Take a sample of Kn elements from this 

population so that elements are drawn from each part. The mean of this 
Kn K 

sample will be @ = )>.2;/Kn. We write this as @ = >> #:/K, where 
i=] 


i=l 
ni 


z; = >> 2z;/n. It follows easily then that: 


ni—n+l 


K K 
M2 = Do uai/K’n, Ms = D uax/K'n’, 
t=1 i=l 


K K 2 K 
M, = [> Mai + 3n (> ws) -—3 2d rh |/x n’, 
as n — ©, 6, > 0, and 6B. — 3. Therefore, it is evident that .if we divide a 
population into K equal parts and take a sample of Kn elements (n elements 
from each part), the distribution of the means probably tends to normal as the 
number of elements in the sample increases. 


7. Summary. Distributions of means for stratified samples have been ob- 
tained for the rectangular population which is symmetric and also for a triangular 
population which can be considered an example of a J-shaped population. For 





52 PAUL H. ANDERSON 


both populations, the means obtained from stratified samples show less vari- 
ability than the means of random samples. The stratified sample means ob- 
tained from the skewed-population exhibit less skewness than do the random 
sample means obtained from the same population. 

The effect of stratification in sampling upon the distribution of the standard 
deviations is to make the distribution more symmetric. This is true for the 
three populations investigated. 

For stratified samples from the rectangular population Student’s ratio is 
much more stable than it is for random samples of the same size. 

Thus it is evident that stratified samples possess advantages over random 
samples of a nature that makes stratified samples worthy of use in research work 
where it is easy to obtain them. 


In conclusion, the author is grateful to Professor A. R. Crathorne for sug- 
gesting the problem of this paper and guiding it to its conclusion, 


REFERENCES 


{1] J. O. Irwin, ‘‘On the frequency distribution of the means of samples from a population 
having any law of frequency with finite moments, with special reference to 
Pearson’s Type II,’’ Biometrika, Vol. 19 (1927), pp. 225-239. 

[2] Partie Hatt, ‘‘The distribution of means for samples of size N drawn from a population 
in which the variate takes values between 0 and 1, all such values being equally 
probable,’ Biometrika, Vol. 21 (1927), pp. 240-244. 

[3] Paut R. River, ‘‘On the distribution of the ratio of the mean to standard deviation in 
small samples from non-normal universes,’ Biometrika, Vol. 21 (1929), pp. 
124-143. 

[4] J. Neyman, “On the two different aspects of the representative method,’’ Roy. Stat. 
Soc. Jour., Vol. 97 (1934), pp. 558-625. 

[5] A. E. R. Cuurcn, ‘‘On the means and squared standard deviations,’’ Biometrika, Vol. 8 
(1926), pp. 321-394. 

[6] A. A. TcHouprorr, ‘‘On the mathematical expectation of the moments of frequency 
distributions,’’ Biometrika, Vol. 12 (1918-19), pp. 140-169, 185-210. 

[7] Jack LapERMAN, ‘‘The distribution of Student’s ratio for samples of two items drawn 
from non-normal universes,’’ Annals of Math. Stat., Vol. 10 (1939), pp. 376-380. 

[8] W. Patin ELperton, Frequency Curves and Correlation, London: Charles and Edwin 
Layton, 1927 (Second Edition), pp. 239. 








iri- 
»b- 
om 


the 


is 


om 
ork 


ion 
ion 
ly 


1 in 
pp. 


icy 
wn 


win 


SOLUTION OF A MATHEMATICAL PROBLEM CONNECTED WITH 
THE THEORY OF HEREDITY’ 


By S. BERNSTEIN 


TRANSLATED BY EMMA LEHNER 
University of California, Berkeley 


Translator’s Note: Although a French resumé of the article here translated 
appeared in Comptes Rendus,’ it is so condensed due to space restrictions that in 
reporting on Bernstein’s work for the Statistical Seminar at the University of 
California, it became necessary to refer to the original Russian paper. Because 
of the obvious language difficulty together with the extreme rareness’ of the 
Ukrainian publication in this country, and because of the current interest in 
the application of statistical theories to genetics, it seemed advisable to make 
this important article available to a much larger class of readers. 

It is regretted ‘that, due to the present conditions, it was impracticable to 
obtain the author’s comments on this translation, and it is hoped that the slight 
changes and additions inserted, to clarify some of the more difficult passages, 
would have met with his approval. 


1. Let us consider N classes of individuals which possess the property that 
the cross of any two of these individuals produces an individual belonging to 
one of the above N classes. We will call such a set of classes a “closed biotype.” 
We will suppose only that the probability of obtaining an individual of class 7 
as a result of crossing two individuals of classes i and k has some definite value 

i, = Aj; , and we will call these probabilities’ “heredity coefficients of a given 
biotype.”’ It follows from the definition of a closed biotype that 


(1) > Ak = 1. 


j=1 


Let a; be the probability that an individual belongs to class j, then under pan- 
mixia® the probability of belonging to class j in the next generation is given by 


(2) a; = z. Aipaiar. 
ik 


1 The original was published in the Annales Scientifiques de l’Ukraine, Vol. 1 (1924), 
p. 83-114. 

2C. R. Ac. Sc., Vol. 177, pp. 528-531, 581-584. 

3 Thanks are due to the Brown University Mathematical Library for their loan of this 
rare periodical. 

‘AJ, is, of course, the relative probability that an offspring belong to class j, given 
that the parents belong to classes 7 and k. 

5 That is, complete absence of selection. 


53 




















54 S. BERNSTEIN 





Similarly we have for the next generation 
(3) a; = >) Alea; ox 
ik 
and so on. 
The problem which we now propose is as follows: 
For what heredity coefficients under panmizxia will the distribution of probabilities 
achieved in the second generation remain unaltered in all subsequent generations? 


If the heredity coefficients satisfy this condition, then the law of heredity 
which corresponds to them is called “stable.” 


2. We prove first of all that the Mendelian law is stable. The Mendelian law 
has to do with three classes of individuals, the first two of which are pure races, 
while the third is a hybrid race such that the cross of an individual of the first 
class with an individual of the second class always produces an individual of the 
third class. It follows therefore that 


Ai = Az = Aiz = 1, while 

An = A» = Ai, = A» = A = Ap = 0. 
The remaining 9 coefficients are defined as follows: 

Ais = Ais = Ais = Ads = Ais = 1/2 

Ajs = Ais = 1/4, while Aj; = Ads = 0. 


If, for simplicity, we denote the probabilities of belonging to the first, second 
or third class by a, 8, y, then (2) becomes 


(4) a =(athy B =6+h) ¥ = Wat Wb +h), 
while on iteration we get the equivalent of (3), namely 
a” = [(a + 4y) + (2 + WB + WV = (@ + W)'(a + B+ 7) 
(5) 6” = [((8 + 47)’ + (@ + (6 + I = B+ W)'(e + 84+ 7) 
7” = (a + 47)’ + (a + GB + MIG + Hy)’ + (2 + WE + YI 
(a + 37)(8 + Iy)(a + B+ 7)’. 


Since a + 6 + y = 1, it follows that a” = a’, B” = B’,y” = 7’, and hence 
the Mendelian law is stable. 


3. The first rather important result can be stated as follows: 
THEOREM: If three classes form a closed biotype under a stable heredity law, 
which is such that the cross of an individual of the first class with an individual of 
the second class always produces an individual of the third class, then the first two 
classes represent pure races and the law of heredity is the Mendelian law. 














es 
8? 


ty 


\W 


st 
he 


Tel 


D, 


of 


MATHEMATICAL PROBLEM OF HEREDITY 55 


If the original probabilities are a, 8B, y, then the corresponding probabilities 
for the next generation can be written as follows: 


a, = Aya’ + 2AneB + AmB’ + 2Assory + 2Awby + Assy’ = f(a, 8, y); 
(6) 6: = Bua” + 2Byo8 + Bo + 2Bisory + 2Baby + Bay’ = ¢(a, 6, ), 
v1 = Cua’ + 2Cro8 + Cu" + 2Cisory + 2CaBy + Coy’ = ¥(a, B, 7), 
where Ax + Bu + Cu = 1. Since Cy = 1, by assumption, it follows that 
By = Av = 0, 


since all the coefficients must be positive, or zero. 

The mathematical problem before us consists in determining three homoge- 
nous quadratic forms f, yg, and y in a, 8, y with non-negative coefficients such 
that 


ftetv=1=(a+B6+7) 

and satisfying the conditions of stability, namely 
f(a, hi.) = fle, 8,y) =a, 
(7) g(a1, Bi, v1) = (a, 8, y) = i, 
(a, Ai,v) = (a, 8,7) =n, 


for all a, 8, y such that® a + 8 + y = 1. The third equation is, of course, a 
consequence of the first two. 

The functions f, g and y, being continuous, assume infinitely many values, 
unless they are constants, in which case they may be expressed as quadratic 
forms by 


f=pet+Bbt+y’, v=qatbBt+y’, pw=rat+Bt+y) 


where p, q, 7 are constants. But, since the coefficient of a is zero in f and ¢, 
and 1 in y, this reduces to the trivial case f = 0, ¢ = 0 and y = 1, which we 
can neglect. 

We now write (7) in the form 


a= f(a, B, v) = a(a +B+y) + Fi(a, B, 7) 
(8) A = e(a, B, Y) - Blo +#@+ 7) +> F(a, B, 0) 
m= ¥(a, B, 7) _ (a +8 + ¥) id F,(a, B, 7) = F,(a, B, 7). 


Since 


(9) J(a,B,y) =a(a+Bt+y), ¢(a,8,v) = Bla +86B+ 7), 
¥(a, 8B, vy) =yvlat+B+y7), 


* The fact that the variables a, 8, y are not independent does not preclude the validity 
of identifying their coefficients in the equations that follow, since all these equations are 
homogeneous. 











56 S. BERNSTEIN 





are obviously solutions of (7), it follows that Fi(f, ¢, ¥) = 0 and F,(f, ¢, y) = 0. 
But, as we have just seen, f, g and y assume infinitely many values. Therefore 
F, and F, either have a linear factor in common, or else are proportional and 
irreducible.’ 

We first show that F; and F; do not have only a linear factor, la + mB + ny, 
in common, for if they did this factor would vanish for a = f, 8 = 9, y = W 
so that 


(10) 





If(a, B, vy) + my(a, B, y) + n¥(a, B, vy) = 0. 





But since neither f nor ¢ have a term in af, while y has, n = 0. Also, since 
f and ¢ have no negative coefficients, | and m are of opposite signs. Let l 2 0, 
while m = —p, where p 2 0. Then the third equation (8) can be written 


(11) ¥(a, B, vy) = y(a+ B+ 7) + (Aa + BB+ Cy)(la — pf). 


The coefficients of a” and @’ in y must be non-negative. Therefore it follows 
that A > 0, while B < 0. But the coefficient of af in y is 2, while Bl — Ap 
cannot be positive. Therefore F; and F, have no linear factor in common, and 
must be proportional. But since the coefficient of a8 in f and ¢ is zero, the 
coefficient of a8 in both F; and F, must be —1, and therefore F; and F2 are 
equal and we can write F; = Fy = F, and (8) becomes: 



























f(a, B, y) = aoa + 8+ y) + Fla, B, y) 
¢(a, B, vy) = Bla + B+ 7) + Fla, B, v) 
¥(a, B, Y) = v(a + B + Y) a 2F (a, B, Y), 





(12) 








where F is an irreducible, homogeneous, quadratic form in a, 8, y. Further- 
more, the coefficient of a’ in F must be zero, since were it positive, the coefficient 
of a’ in f would exceed 1, and were it negative, the coefficient of a in » would 
also be negative, which is impossible. Similarly, the coefficient of @° in F is 
also zero. We can therefore write F in the form 





(13) F(a, 8, y) = —aB + cay + dBy + ey’. 





Moreover, we know that 











(14) F(a’, 8’, y') = F(aS + F, BS + F, yS — 2F) = 0, 












for all values of a, B, y, such that a + B + y= S = 1. Expanding (14) in 
Taylor series about the point (aS, BS, 7S) in three space we get only three terms 
in the expansion, since all the derivatives of order greater than the second are 
identically zero, and the constant term can be obtained very simply by putting 
a=B=y=0inF(aS + F,BS + F,yS — 2F). In this way we have 





7 See Bécher, Introduction to Higher Algebra, p. 210, Theorem 3. 
















MATHEMATICAL PROBLEM OF HEREDITY 57 


(15) F(aS + F, BS + F,yS — 2F) 
= F(a8S, BS, yS) 
+ FIF.(aS, BS, yS) + Fo(aS, BS, yS) — 2F,(aS, BS, yS)| 
+ F(F, F, —2F) = 0. 
Since F is a homogeneous form of the second degree 
(16) F(aS, BS,yS) = S’F(a, 8,7); F(F,F, —2F) = F’F(1, 1, —2), 
while its derivatives with respect to a, 8, y are homogeneous linear forms so that 
(17) F.(a8, BS, yS) = SFa(a, B, 7). 


Substituting these in (15) and dividing out an F(a, 8, y), which is not identically 
zero, we get 


(18) S° + S[Fa(a, B,v) + F(a, B, y) — 2F;(a, 6, y)] 
_ — F(a, B, y)F(1, i, —2). 


But since F(a, 8, y) is irreducible, F(1, 1, —2) must be zero. Dividing by S 
we finally get 


(19) S = 2F, — F, — Fs 


or (a + B + y) = 2(ca + dB + ey) — (—B + cy) — (—a + dy) from which 
it follows that 


c=d=0Q, e = 1/4, 
and hence 
(20) F(a, B, y) = 7/4 — af. 
Therefore we have found that 
fla, B,y) = aa + B+ 7) + 7/4 — a8 =a" + a7 + 7/4 = (a + Hy)’, 
(21) ¢(a, B,v) = Bla +6 +7) + 7/4 — oB = 6 + By + 7/4 = (6B + hy)’, 
V(a, By) = ya + B +) — 47° + 28 = 2a + 4y)(6 + Hy), 


which is the Mendelian law. 


4. We have therefore shown that the Mendelian law is a necessary conse- 
quence of any stable law, which is such that the cross of the first fwo classes 
produces the third hybrid class. We have not even assumed a priori that the 
first two classes are pure races. From a theoretical point of view it is interesting 
to investigate the possibility of crossing two pure races under different laws of 
heredity, which are nevertheless stable. 

We wili therefore suppose to start with that the coefficients of a” in f(a, B, ) 











58 S. BERNSTEIN 





and of @° in g(a, B, y) are equal to unity. Beginning with equations (8) of the 
previous section, which merely express the condition that the heredity law under 
consideration be stable we can write 


(22) F, = F = —aoB + cay + dpy + ey’. 


As before, F; and F, cannot have a linear factor in common, hence they are 
proportional, and we can write F, = AF, F; = F. We therefore have five 
coefficients to determine: a, c, d, e, and X. Since 


(23) F(aS + F, BS + AF, yS — (A+ 
we have an analogue of (19) 


(24) S=(1+A)F, 





















1)F) =0 


—- F. — Fs 





or 





a+B+y= (1+ A)(ca + dB + 2ey) + aB — cy + Aaa — Ady. 


Equating coefficients of a, 8, y as before we have 












1 = c(1 +A) + adore = (1 — da)/(1 +A), 
1 = d(1+A)+aord = (1 — a)/(1 + A), 
1 = 2e(1 + A) — c — Ad = 2e(1 + A) — (1 — Aa)/(1 +A) — (1 — a)/(1 +A), 


or 





= (1+ —ad)/(1 +d). 


Therefore the most general quadratic form F satisfying our conditions can be 
written 





ha l-—a 1+A—a 2 
(26) Fle, By) = ~aag + = ay + Pt + Hye 









If we let ad = b, this becomes 


—b a + b- ab 2 
(27) F(a, B, 7) = —aaB # a+ “Se av + 4 5 By + “ae 


Substituting this value of F into f, 9, y and saibi we get 


1 7)[a +(1-a)p+ (1 - 5) yh 
ola, Br) = (8+ 5 sy)[a- me +e4 (1-22 5)q], 


Va, 8,9) = (a+ 6)(at ‘ v)(8+ 4, 7) 





(28) fla, 8, y) = (a+; 


t= +|« 





















he 
er 


ve 





MATHEMATICAL PROBLEM OF HEREDITY 59 


where in order that all the coefficients be positive it is necessary and sufficient 
thatO <a <l,and0O <b <1. Incase a = b = 1 formulas (28) coincide 
with (21) and we get the Mendelian law. 

The question of whether there actually exist — laws which satisfy (28) 
with a < 1, and b < 1 can only be solved experimentally. Theoretically for- 
mulas (28) give the most general heredity law of a closed biotype consisting of 
three classes, with the condition that two of the three classes be pure races. It 
is easy to see that the only law of heredity in which all three classes are pure 
races is given by the particular solution of (8) 


(29) f=aa+Bt+y), g¢=Blat+BbBt+y), p=rla+68+7), 
in which F; = F, = 0. 


5. Supposing as before that the heredity law is stable, it remains to prove 
the following theorem to exhaust all possible biotypes consisting of only three 
classes. 

THEOREM: If all classes are hybrid, then 


(30) f=piatBt+y)’, g=qet+bt+yr), V=ratbBt+y). 
If only one of the classes represents a pure race, then either 

f = (a + B)[R(1 + b)(a@ + B) + (1 — dy] 
(31) ¢ = (a + B)[3(1 — b)(a + B) + ay] 

¥ =r(a+8 +7) 


or 


(32) f = aS + aa(uB8 +7) and wty=0. 


We have seen that if f, ¢, and y are functions of (a + 8 + 7), then we arrive 
at (30), in the contrary case we arrive at (8). Here we distinguish two cases: 
1) F, and F, are irreducible quadratic forms which are proportional: F; = k,F, 
F, = koF, and 2) F, and F, have a common factor, which is a linear form. Sup- 
pose at first that F is a quadratic form. If none of the numbers k,, kz, and 
k, + ke is zero, then two of them may be taken as positive, say k,; andk,. But 
then the coefficients of a’ and # in F would have to vanish in order that y have 
no negative coefficients. But this case of two pure races has already been 
discussed, and leads to formulas (28). We must therefore suppose next that 
one of the numbers k; , kz , or ke + ki is zero. Suppose that k, + k, = 0, that 
is, that the third class is a pure race, and hence the coefficient of 7’ in y is unity. 
Therefore, the coefficient of 7’ in F must be zero. We can take k = 1, then 
k, = —1, and therefore the coefficient ay in F is negative, say —d. We can 
now write 











60 S. BERNSTEIN 


(33) F = aa’ + baB + cB’ — day + ey. 
We have as before 
(34) 





F(aS + F, BS — F,yS) = 0, 


from which we derive by Taylor’s expansion 





(35) S=F,—F, 
or in other words 


at+B+y7 = ba + 2B + ey — bB + dy — 2aa, 









which leads to 
(36) = F = 3(b — 1)a’ + baB + 3(b + 18 — day + (1 — d)By 
and hence to f and ¢, which are as follows, 
(37) f = (a + B)[R(1 + b)(a@ + B) + (1 — d)y], 
eg = (a + B)[2(1 — b)(a@ + B) + dy]. 



















It now remains to suppose that F is a linear form. Let 


(38) 





F = )ha+ uB+ 7. 
Here the condition that the heredity law be stable leads as before to the equation 
(39) S = (k+h)F, — kPa — kiFs = (k +x) — k - uh, 





where k and k;, are linear forms 


(40) 














kK=aa+bB+cy, k= aa-+biB + cy. 













Hence if we had no restrictions on signs and magnitudes we could select k 
arbitrarily, and then we would have ki = [S + (A — 1)k]/[l1 — ul], and the 
solution for f, y, ¥ would depend on five parameters, (A, u, a, 5, c). 

But since in f = aS + kF, the coefficients of 6’, By and y’ are non-negative 
ub > 0, and b + uc > 0, c > 0, and similarly from the same property of ¢ we 
have Aa; > 0, ec; > 0, a, + Ac; > O0.. But uw and A cannot both be non-negative, 
for then Af + ne + ¥ = O would be impossible. 

Let » < 0, then b = c = 0, but then the coefficient of a’ in f would be 1 + ad, 
which will be too big, unless \ = 0. Hence, F = uB + 7, k = aa, and 


(41) f = aS + aa(us + 7), 
Y = —ne = w[S(B + y) — aa(us + y))/[e — 1). 


Hence we have exhausted all possible cases and have proved our theorem. 











MATHEMATICAL PROBLEM OF HEREDITY 61 


6. We can summarize our results as follows. The heredity laws of a closed 
biotype of three classes which are stable can be divided into the following types: 

1. Two classes represent pure races. The heredity laws are given by (28), 
and in particular for the Mendelian case by (21). 

2. There are no pure races, and every race can be obtained by crossing the 
other races. The heredity law is given by (30). 

3. All three classes are pure races. The heredity law is given by (29). Any 
two classes of this biotype, also form a closed biotype. 

4. One of the classes is a pure race. The heredity laws are given by (31) 
and (32). 





SOME RECENT ADVANCES IN MATHEMATICAL STATISTICS, I 


By Burton H. Camp’ 


Wesleyan University 


The papers considered in this partial review are listed at the end. For the 
most part they have appeared within the last five years, but in order to explain 
what has been done within the last five years it has been necessary occasionally 
to use material that appeared earlier. The subject matter is divided into 
four parts. 


Part I. The Theory of Tests. Since an attempt is being made to present 
the material of this paper in such a form that it may be read rapidly by those 
who have‘not read the underlying literature, the author will endeavor to do little 
more, in Part I, than to define and illustrate several terms which are being used. 
Altogether there are nine of these terms. It is fortunate that their meanings 
can be explained pretty well by reference to an extremely simple picture. Let 
each of the curves in the figure indicate a probability distribution p(x; 6), in 
which there is a single variate z and a single parameter 0. 


1 
Example 1. p(z, 6) = wi e *=* the normal distribution in which the 
T 


center is at x = 6, and the standard deviation is unity. 

Let a random sample E be drawn from a population indicated by such a 
curve. In the simplest case E = z, a single individual. Shortly, we shall have 
to suppose that there are N individuals: E = 2,,---,2y~. Eventually, the 


6, 6, 


AA 
|e f 


ARN 
< 


SS 


4 


Aplin 


sideiiiaa DISTRIBUTIONS 


picture will be generalized much further. The population will be described by a 
function of n variables, so that, in place of each x of our sample, we shall have 


1 One of two papers read by Cecil C. Craig and by the author at a joint meeting of the 
Institute of Mathematical Statistics, the Econometric Society and the American Statistical 
Association, held in New York City on December 30, 1941. 


62 








bd ~ = wer 


wv 


a ad 





MATHEMATICAL STATISTICS 63 


.++,2°; moreover there will be, not one parameter, but | parameters 


yee 
g, --- , 0°”; so that our probability distribution will be multivariate and will 
be ‘ee by 


pe, «++, os 9... gf, 
A common way of putting this is to say that x and @ are vectors in n and 1 
dimensions, respectively, and to leave the form as originally, p(x; @). In the 
figure the space which the samples (E = 2) can occupy is of course aot more 
than the z-axis, but in the most general case the sample space will be a part or 
all of a space of nN dimensions and will be denoted by W. As is well under- 
stood, a significance test is an inequality which specifies in W a certain region 
w as a critical region, and if E is in this w, the hypothesis being tested is rejected. 
For example, in the figure, one might test the hypothesis Hy that 6 = 6. The 
rejection region wy might be the part of the z-axis where x > 2. In all such 
cases we shall let a equal the probability that E is in w if 6 = 4. This state- 
ment will be denoted as follows: 


(1) a = P(w | 6), 


P standing for probability. 

(1) Power of a test. A good test should satisfy two conditions: (a) if our 
sample is drawn from the population specified by % , the hypothesis Hy that 
6 = 6 should be accepted as often as possible, and (b) if our sample is drawn 
from a population specified: by some other value of 6, say 6; , then the hypothesis 
that 6 = 6, should also be accepted as often as possible. Suppose first that 
there are but these two admissible populations. The probability of (a) is 1 — a. 
We commonly make the artificial requirement that this shall be some larger 
fraction such as 0.99. The probability of (b) is commonly denoted by 8, and 
in the figure, when w = wo, 6 is the area under the 6 curve which lies to the 
right of z = 2. Relative to 0, 6,, and a, the quantity @ is called the power 
of that test which designates wy as the critical region. Also, a and (1 — 8) 
are the probabilities of the so-called errors of the first and second kinds, 
respectively. 

(tt) Unbiased test. As stated, we would like to have 6 large. In any case 
we would like to have 8B 2 a. If B 2 a, the test and the corresponding region 
wy are “unbiased” (relative to the preassigned quantities 4 , 6:, and a). The 
region w) appears to be unbiased in our figure. This definition can obviously 
be extended to the case where, in addition to 6, , there is an infinity of admissible 
values of 6; then the test is unbiased relative to the whole family of admissible 
values of 6 if, for every one of these 6’s, 8 2 a. 

(iti) UMP test and CBC region. If, with respect to a family of admissible 
6’s, a critical region wy exists such that, for each of these 6’s (+ 4), 8 is greater 
than it would be for any other critical region satisfying (1), then this w is said 
to be the common best critical (CBC) region and the corresponding test is the 
uniformly most powerful (UMP) test. 














64 BURTON H. CAMP 






(w) UMPU test and CBCU region. If there is no CBC region, still it may 
happen that, if one restricts one’s view to only unbiased regions, there may be 
among them a CBC region. Such a region is said to be a common best critical 
unbiased (CBCU) region, and the corresponding test is the uniformly most 
powerful unbiased (UMPU) test. 

In the following examples, and elsewhere, we shall now use H, to indicate the 
hypothesis being tested, H* to indicate all admissible alternatives. 

Example 2: p(x, 6) normal as in Example 1, E = x, Hy: 0 = 6, H*:@> @. 
The CBC region is where xz > k if 


[ p(x; %) dx = a. 


This region is the interval indicated by w» in the figure. 

Example 3: Same as the preceding example except that now we have as 
H*: 6 4 0. There is no CBC region, but the CBCU region consists of two tail 
sntervals, where |a| > kif 


[ ° p(x, 00) dr = $a. 


A little reflection will convince the reader that the statements in these two 
examples are at least apparently true. It is geometrically evident, for example, 
that the last mentioned region (two tail intervals) is not as powerful with respect 
to the alternatives of Example 1 (8 > 4) as is the single tail region w in the 
figure. 

(v) Type A regions. It is often difficult to find even a CBCU region, or such 
a region may not exist, but it may be that there is a region which has the required 
properties if one admits only values of 6 near to the value % being tested. Type 
A regions have this property. More precisely, they have the property that 
the power of wo is a minimum at 4 with respect to small changes in 6, and that 
this is a sharper minimum at 4 than is the power of any other wp which satisfies 
equation (1). Here the words “small changes” are used as in the calculus. 
The full definition [4] of an unbiased region of type A is that it shall satisfy (1) 
and also the following conditions: 

(2) 6 shall be a single parameter (not a vector), 


(3) £, Pw» | 6) = 0ifo=6,. 


2 2 
(4) S P(w | 6) = 5 P(w | 6) when 6 = 6 for all regions w which satisfy 


the preceding conditions imposed on w. There are also other types of regions 
designated by Ai, B, C, and D, which resemble Type A [9]. The following 
example illustrates Type A; it is a familiar problem with an unfamiliar solution 


[4]. 


1 
Example 4. p(z; ¢) = ae: E = ,°°:,%v; Ho: ¢ = 0; 








MATHEMATICAL STATISTICS 65 


H*: « # oo. The CBCU region of type A is determined by two tail areas 
(but they are not equal tail areas) of the distribution of 22? . 

(vt) Test unbiased in the limit [11]. (vit) Asymptotically MP test [15]. (viii) 
Asymptotically MPU test [15]. In these cases the complete definitions are too 
lengthy to be repeated here, and they cannot be recapitulated briefly. The 
general idea is that, if none of the regions of the preceding types exist, still it 
may be true that there are regions which do have approximately the desired 
properties if EF = 2,---,2y, and N is large. The following example [11] 
illustrates (v7). 
ieee E = %,-:+,2n; Ho: 6 = 0; 


H*: @ # 0. Regions of Type A unbiased in the limit are defined by the in- 
equality, 


vs 1 x , 5N N 
tase ae pat t(ZiHy) aM 3 ~ 2 
Here M is a quantity that has to be approximated and tabulated. The in- 
equality is not simple, but it furnishes a definite answer to the problem. 

(ix) Regions similar to sample space. All the preceding definitions apply to 
the case where z is a vector in n space, but not all to the case where @ is a vector 
in 1 space. Suppose now that this is the case, or, as we have said before, that 
there are 1 different parameters 6, --- , 0°”, each being capable of taking on a 
variety of values. Suppose we fix our attention on 6” and wish to test the 
hypothesis that @ = 0{”. First of all we wish to find a critical region wo for 
which an equation like (1) will be true, independently of what the values of the 
other parameters may be. Such a region is said to be “similar” to sample 
space; the “similarity” consists in the fact that the equation like (1) would be 
true independently of the other parameters, if wo were replaced by all of sample 
space W, and if a = 1. Feller [10] has shown that there are simple cases in 
which there is no region similar to sample space. He and others have investi- 
gated the conditions under which such regions do exist. ‘Generally speaking 
it seems that for most of the probability laws p(z, 0, ---, 6°”) in which the 
composite probability law for sample space is made up by multiplication, 


(2) I (p(z:) | , ee a), 


there do exist such similar regions, at least if N > 1.” 


Example 5. p(x; 6) = : 


Part II. Estimation. (7) Estimation by interval. So far we have been con- 
sidering possible answers to the question: Shall specified values of 0, --- , 6 
be accepted? The totality of values of the 6’s which are so acceptable might 
be called the acceptable point set in parameter (/-dimensional) space. This 
point set is determined by the sample or experiment (Z), and usually different 
point sets are determined by different E’s. Frequently this set of points consti- 











66 BURTON H. CAMP 





tutes a simple closed region, or, in the case of only one parameter, it may be a 
single interval. Such an interval is called a fiducial or confidence interval. 
The fundamental property of such a point set or interval is well known, but has 
to be stated with some care: If a = 0.01, and if one is about to take a sample 


from a population in which the true values of the parameters 0, ---, 0 are 
93, «++, 069, then the probability is 0.99 that the sample will be such that the 
point set determined by it will contain this true parameter point 0{”, ---, 06". 


It does not matter whether or not one knows what these true values of the 
parameters are. If there is more than one parameter, the fiducial interval for 
one of these parameters often does not exist; that is, there is often no such in- 
terval which is independent of the values of the other parameters. The question 
whether there is such an interval is obviously connected with the question 
whether there are regions similar to sample space. But if one fiducial interval 
does exist, then usually there are an infinity of them, and our problem is to 
choose the best one. This problem is called ‘estimation by interval.” One 
answer is to choose the shortest interval. More precisely one should say, the 
shortest system of intervals. One gets a system of intervals by fixing a but 
not E. What is desired is a formula which will give the shortest interval for 
every E, but it may well happen that one formula (system) will supply the 
shortest intervals for some E’s, and another will supply the shortest intervals 
for other E’s. The choice between the two systems will then depend on the 
relative frequency: with which the shortest intervals will be supplied by one 
system or by the other. 

Example 6: p(z; &, a) is normal, ~ indicating the mean and o the standard 
deviation. Given E = 2, ---,2y; to estimate ¢. The shortest system of 
confidence intervals does not exist (independently of ¢). 

Example 7. Same as Example 6, except that now one seeks only an upper 
limit to the confidence interval which the parameter must not exceed. Then 
the shortest system (best one-sided estimate) is: § S £ + ts, where Fisher’s ¢ 
and s are meant; ¢ corresponds to a preassigned a, and Z is the mean of the 
sample. 

In cases like Example 6, where the shortest system does not exist, Neyman 
[7] defines a “short unbiased system.” 

Example 8. The short unbiased system for Example 6 is: # — ts S ~§ S 
= + ts, (t, 8, Z) as in Example 7. 

(tz) Single estimators. Suppose’ that, as before, we have a sample (£) and 
wish to choose the best single value for one of the parameters, not as before its 
best fiducial interval. It is well known that there often exists a fiducial func- 
tion g(6) which, like a probability function, is everywhere positive or zero and 
has an integral, 


[. 0(@ a0 =1, 


and is further useful in determining confidence intervals. In particular, if 0 is a 
location parameter and if the composite probability function is as in (2), with 





MATHEMATICAL STATISTICS 67 


only one parameter 6: g(@) = kp(z. — 6) --- p(zw — 6), k being a constant. 
An estimate commonly thought of as best is the maximum likelihood estimate: 
this is the mode of g(@). Other estimates that have interesting properties are 
the mean and the median of g(@). Pitman [14] defines a new “best’’ esti- 
mate 6,. This has the property that, for every h > 0, 6s is within h of the 
true value @ more often than is any other estimate. More precisely, if 


P(| 0s — 8| Sh) = P(\ — 8| SA) 


for all positive values of h, and if the inequality sign between the P’s holds for 
some positive value of h, 6, being every other estimate, then 6, is the “best” 
estimate. As before P stands for probability. 

Example 9. If p(x; &, o) is normal and the sample E = 2, ---,2y, the 


2 2 2 

“best” estimate of o* is "5 , instead of the usual estimates: eo4 : at : 
(tit) Weight function. Wald [13] defines a weight function V(6, 62) which 
depends on the seriousness of the error committed when the estimate 6z is used 
in place of the true value of the parameter 6. The sample E = 1, --: ,2y; 


and @ may be a vector. Thence he defines a risk function, 


r(@) = [ V-p(a1, «+, ty 6) aW, 


and the “‘best”’ 6g as that value of @ which minimizes the total risk, 


| ve an, 


this integral being taken over all of the parameter space, and f(6) being the 
a priori distribution of 6. It is undesirable to introduce f(@), but it can be 
shown that, subject to slight restrictions on the nature of f, one can obtain a 
best estimate by finding a value 6, which for all @’s makes r equal to a constant 
and also satisfies other general conditions; this equation and these conditions 
do not contain f(@). In a symmetrical but otherwise fairly general case 6, is 
the maximum likelihood solution. 


Part III. Likelihood Tests. This part has to do mostly with special cases 
of likelihood tests. As is well known, this test consists in selecting a critical 
rejection region w in sample space where 

(a) P(w| Ho) = a, 

(b) the relative likelihood of Ho is small; more precisely, where \ < constant, 

and 
_ max, P(E |) 
~ maxg P(E|Q)’ 


w being the region in parameter space specified by the hypothesis tested Ho , 
and © being the region in parameter space specified by all admissible hypotheses. 
(In special cases maz is replaced by least upper bound.) If Ho is simple (w being 
a point) and if the CBC region w exists, then w is bounded by the contour, 


















68 BURTON H. CAMP 






>’ = constant [19]. Otherwise this \ test does not necessarily yield the same 
critical regions as do any of the preceding tests. But it is generally much 
easier to apply, and, in many of the cases that follow, these \ tests are good ones 
as judged by the preceding theory. They are powerful even if they are not the 
most powerful of all tests, and often this power can be found and tabulated. 
In fact Wilks [28] has shown that the appropriate distribution of \ (omitting 
terms of order 1/+/N) can be found’ if the distribution of E is 


N 
II pla, 0, ---, 0), (N large) 
t=—l 


and’ if the optimum estimates 6”, --- , 6°” exist and are distributed (except for 
certain terms of order 1/+/N) normally. This theorem has now been general- 
ized by Wald, in a paper presented to the American Mathematical Society in 
December, 1941. 

There are many of these tests, made to fit all sorts of hypotheses. The author 
will try to summarize a considerable group of them; all members of this group 
might be called generalizations of the Student-Fisher ¢-test. They fall naturally 
into two classes, according as to whether the individuals of the sample are taken 
from a univariate or from a multivariate universe. Unless otherwise stated all 
universes shall be normal. Hp shall stand for the hypothesis being tested, and 
H* for all admissible alternatives to Ho . 

(t) Univariate case. The sample consists of N elements, as before, x1,--- ,2w, 
chosen independently from N normal populations indicated by their parameters 
(&: , 01), --- , (Ev, ow). About these populations we may ask a variety of ques- 
tions resulting in a variety of problems and tests. 

Problem a: If the populations are all identical (~, «), does § = & (specified 
in advance)? This results in the well-known t-test. The hypothesis tested Ho 
is that § = & , and the alternative hypothesis H* is that § ¥ & ; it being assumed 
at the outset that all the populations are identical. The t-test has been shown 
to be an UMPU test relative to H*. 

Problems b, c, d: Let these same samples be arranged in k groups or “columns” 


1 k 
ai? eee ai”) 





2) ... gf 
where the n; are not necessarily all equal. Let it be assumed that the popula- 
tions (¢, ¢) do not change within the columns. Problems (b), (c) and (d), with 
their corresponding tests, may be indicated as follows: 


(b) Are (£, c) constant from column to column? (The Ay = L test.) 


2 Distribution of (—2 log X) is like that of x? except for terms of order 1/+/N. 


2 See Doob’s conditions, Transactions of American Mathematical Society, vol. 36 (1934), 
pp. 759-775. 








MATHEMATICAL STATISTICS 69 


(c) Is o constant from column to column regardless of what values the ¢’s 
may have? (The Az, = Ly test.) 
(d) Is & constant from column to column assuming the o’s constant? (The 
An, ae Le test.) 
In Problem (b), Ho is that (¢, «) are constant, H* that they are not constant. 
In Problem (c), Ho is that o is constant, H* that it is not constant. In Problem 
(d), Ho is that ¢ is constant, H* that it is not constant. The test of Problem (c) 
has recently been shown to be unbiased only if the numbers in all the columns 
are the same (n; = --- = nm). It is, however, unbiased in the limit. Power 
tables were published in 1937 [23]. Bartlett’s (1937) u is another test for this 
problem, and Pitman’s [36] L test is another, but it has been shown that these 
two tests are equivalent. Both are unbiased; they are not likelihood tests. 
This problem is frequently called the eee of the “homogeneity” of a set of 
variances. 
All these tests are, of course, functions of the observations, and the details 
are readily available in the papers listed. For example, Pitman’s 


= 34N log = Wi — 2(n log 5). 


where S; is what he calls the “squariance”’ for the 7th column, and a large value 
of L is significant. The squariance is what the physicists had called and what 
statisticians 6ught therefore to have called the second moment, viz.: Nue ; ue is 
really the unit second moment. 

(e) Linear Hypothesis. Problems like the above, and many others, can be 
included in a general theorem by Kolodziejczyk, who showed how to write out 
quite simply the likelihood test if each é is a linear function of / parameters 
(l < N) and if the hypothesis Ho specifies the values of r different linear functions 
of the 6’s (r S$ 1). Furthermore, the power of this test (with numerous applica- 
tions) was discussed and tabulated by Tang in an important paper [39]. 

Problem (f). This method (e) has been used by Neyman’ [43] to test the 
homogeneity of a set of variances, the problem already studied by a number of 
authors. It has been stated that some of their tests were unbiased with respect 
to the alternative hypothesis that the o’s were not all equal. Neyman gives 
reasons for supposing, in the industrial problem he is considering, that it would 
be more realistic to consider another alternative hypothesis, namely, H* that 
the o’s are not all equal and that their distribution can be approximately 
described by saying that 1/0” has a x’ distribution. No UMP test exists but 
there does exist a critical region whose power, with respect to a sub-family of H* 
is independent of the means, and the corresponding test is the most powerful 
test for this sub-family of alternatives. Tables of its power are furnished.. 
More applications are promised. 

(ii) Multivariate case. The sample consists of N elements, exactly as before, 
except that now each z is a vector in n space and comes from a multivariate 








70 BURTON. H. CAMP 


normal universe whose means may be represented again by é if we think of 
as being a vector in m space. The other parameters of this universe are the 
variances and covariances a;;. So, with these changes, we may repeat the 
statement at the beginning of (7) that the sample is x, , --- , zy , and that the 
populations are (£1, aij), --« , (Ew, aijw). The questions to be asked about 
these populations correspond exactly to those asked in the simpler case. 

Problem (a): If the populations are all identical (£, a;;), does § = (specified 
in advance)? The answer is given by Hotelling’s 7 test. The hypothesis 
tested is Ho that the vector § = &, and the alternative hypothesis H* is that 
these two vectors are not identical. P. Hsu [28] has shown that this test is the 
most powerful in a special sense, and has given a new demonstration of it by the 
use of the Laplace transform. Incidentally he has shown that the Laplace 
transform of an elementary probability law determines the law uniquely except 
perhaps at a null set of points. 

Problems (b), (c), (d): Now let the same sample be arranged in k groups or 
columns, as in (2) b, c, d; and let it be assumed that the populations (€, a;;) 
do not change within the columns. Problems (b), (c), and (d), with their corre- 
sponding tests, may be indicated as follows: 

(b) Are (é, a;;) constant from column to column? (The Axa) test). 

(c) Are a;; constant from column to column regardless of what, values the é’s 

may have? (The Aga) test). 

(d) Is the vector ¢ constant from column to column assuming the a;; constant 

from column to column? (The dg, test). 
Unfortunately, in the customary notation, the )’s for this case (77) do not follow 
the pattern adopted in (7). It would be better to put (n) after each of the X’s 
(or L’s) in (2) to signify the corresponding tests in (77). But, even if this were 
agreed upon, there would still be a confused notation because there are many 
other “‘\”’ and “L”’ tests besides those listed here. Apparently‘ the power func- 
tions of these last three multivariate tests have not been found yet. 

(e) The linear hypothesis theory was shown to be applicable to the multi- 
variate case in a special instance by P. Hsu in 1940 [38]. Since then he has 
generalized it further [45]. 

(117) Bivariate case. This important special case of (77) has now been pretty 
thoroughly solved. A general summary of various tests which have been de- 
vised by Finney, Pitman, Morgan, Wilks, and E. S. Pearson was given by C. 
Hsu in 1940 [42], with some slight additions and with tables of power functions 
with respect to certain alternatives. Altogether there are seven of these tests 
corresponding to seven different problems, including the four just referred to as 
Problems a, b, c, and d. 


































PartIV. The Method of Randomization. This part concerns randomization 
of the individuals within a sample to obtain a method of testing hypotheses 
without making use of any characteristic of the population from which the 
sample was drawn. It does not deal with randomization in field experi- 





‘ So far as the author is aware; but he does not pretend to have made a careful search. 












-MATHEMATICAL STATISTICS 71 


ments to off-set the effects of variable fertility. Also, in this discussion, the 
hypothesis being tested is not that the sample was a random sample. It is 
assumed that the given sample is random. We begin with an example from 
Pitman [46]. Two samples, (2, --- ,2w) and (yi, --- , yw), have been drawn 
at random from two populations. The means of the samples are < and 4%, 
respectively. Let |Z — | be called the spread of these samples. Now re- 
arrange these same z’s and y’s with each other in all possible ways to obtain all 
possible spreads. The larger the observed spread, among all these possible 
spreads, the more significant it is supposed to be as a test of the (null) hypothesis 
that the two populations were identical. Similarly, tests have been devised for 
correlations, variances, etc. 

E. S. Pearson [51] in 1938 published-a criticism of this general theory which 
in substance seems to be that the reason why one calls the largest spreads sig- 
nificant, rather than the smallest ones, in the illustration just used, is that one 
is assuming tacitly that the admissible populations are such that large spreads 
would be more likely on some other than the null hypothesis; that if one does 
not make some such implicit assumption, then one might quite as well call the 
smallest spreads significant; and that therefore, barring such implicit assump- 
tions, one can control only errors of the first kind by this method. 

It seems to the author that Pearson’s criticism is sound, and that, if indeed 
one is unwilling to make any assumption whatever about the populations con- 
sidered, then this device is of no’ value in testing the null hypothesis. For, if 
all that one pretends to do is to control errors of the first kind, one can do that 
by eonsulting a table of random numbers of two digits. Thus one can control 
errors of the first kind without performing the experiment at all, let alone 
making the long computations usually required by the method of randomiza- 
tion. Or, better, one can reduce that error to zero simply by making up one’s 
mind that one will never reject the hypothesis being tested: certainly one will 
never reject it improperly if one never rejects it at all. 

However, if one is willing to make in the illustration used the very mild 
assumption that the populations considered are such that unusually large 
spreads would more probably be obtained from some admissible hypothesis 
other than the null hypothesis, then it would seem to the author that the method 
would be useful. Similar remarks apply to the tests for correlations, vari- 
ances, etc. 


REFERENCES 


Part I. THeory or Tests AND Part II. Estimation 


{1] J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses,’”’ Phil. Trans. R. Soc., A., Vol. 231 (1933), pp. 289-337. 

[2] J. Neyman and E. 8S. Pearson, “The testing of statistical hypotheses in relation to 
probabilities a priori,’’ Camb. Phil. Soc. Proc., Vol. 29 (1933), pp. 492-510. 

[3] S. S. Wi1xs, ‘Test criteria for statistical hypotheses involving several variables,’’ 
Jour. Am. Stat. Asso., Vol. 30 (1935), pp. 549-560. 

[4] J. Neyman and E. 8S. Pearson, ‘‘Contributions to the theory of testing statistical 


’ Pearson’s language is not so strong as this. He says ‘‘perhaps it should be described 
as a valuable device rather than a fundamental principle.” 










BURTON H. CAMP 











































hypotheses. I. Unbiased critical regions of type A and type A, ,” Stat. Res. 
Mem., Univ. of London, Vol. 1 (1936), pp. 1-37. 
[5] J. Nervman and E. S. Pearson, ‘‘Sufficient statistics and uniformly most powerful 
tests of statistical hypotheses,’’ Stat. Res. Mem., Vol. 1 (1936), pp. 113-137. 
[6] E. J. G. Prrman, ‘‘The closest estimates of statistical parameters,’’ Camb. Phil. Soc. 
Proc., Vol. 33 (1937), pp. 212-222. 
[7] J. NeyMan, “‘Outline of a theory of estimation based on the classical theory of proba- 
bility,’”’ Phil. Trans. Roy. Soc., A, Vol. 236 (1937), pp. 333-381. 
[8] J. Neyman, “‘On statistics the distribution of which are independent of the parameters 
involved in the probability law of the original variables,’’ Stat. Res. Mem., Vol. 2 
(1938), pp. 58-59. 
[9] J. Neyman and E. S. Pearson, ‘“‘Contributions to the theory of testing statistical 
hypotheses,’’ Stat. Res. Mem., Vol. 2 (1938), pp. 25-57. 
[10] W. Fever, ‘‘Note on regions similar to the sample space,’’ Stat. Res. Mem., Vol. 2 
(1938); pp. 117-125. 
[11] J. Neyman, ‘Tests of statistical hypotheses which are unbiased in the limit,’’ Annals 
of Math. Stat., Vol. 9 (1938), 69-86. 
[12] S. S. Witks, ‘‘Fiducial distributions in fiducial inference,’’ Annals of Math. Stat., 
Vol. 9 (1938), pp. 272-280. 
[13] A. Waxp, ‘‘Contributions to the theory of estimation and testing hypotheses,’’ Annals 
of Math. Stat., Vol. 10 (1939), pp. 299-326. 
[14] E. J. G. Pirman, ‘‘The estimation of the location and scale parameters of a continuous 
population of any given form,’’ Biometrika, Vol. 30 (1939), pp. 391-421. 
[15] A. Waxp, ‘“‘Asymptotically most powerful tests of statistical hypotheses,’ Annals of 
Math. Stat., Vol. 12 (1941), pp. 1-19. 


Part III. LikeLinoop Trests—SpeEciaL Cases 





[16] H. Horeuina, ‘‘Generalized t-test,’ Annals of Math. Stat., Vol. 2 (1931), pp. 360-378. 

{17} S. S. Wixks, ‘‘Certain generalizations in analysis of variance,’’ Biometrika, Vol. 24 
(1932), pp. 471-494. 

[18] E. S. Pearson and S. S. Winks, ‘‘Methods of statistical analysis appropriate for k 
samples of two variates,’’ Biometrika, Vol. 25 (1933), pp. 353-376. 

[19] J. NeyMan and E.S. Pearson, ‘“‘On the problem of the most efficient tests of statistical 
hypotheses,’ Phil. Trans. Roy. Soc., A., Vol. 231 (1933), pp. 289-337. 

[20] B. L. Wetcu, ‘‘Some problems in analysis of regression among k samples of two vari- 
ables,’’ Biometrika, Vol. 27 (1935), pp. 145-160. 

[21] S. S. Wiiks, ‘‘Test criteria for statistical hypotheses involving several variables,”’ 
Jour. Am. Stat. Asso., Vol. 30 (1935), pp. 549-560. 

[22] P. P. N. Naver, ‘‘An investigation into the application of Neyman and Pearson’s L, 
test, with tables of percentage limits,’’ Stat. Res. Mem., Vol. 1 (1936), University 
College, London, pp. 38-56. 

[23] S.S. Wriks and C. M. Tuompson, ‘“‘The sampling distribution of the criterion H when 
the hypothesis tested is not true,’’ Biometrika, Vol. 29 (1937), pp. 124-132. 

[24] D. J. Finney, ‘‘The distribution of the ratio of estimates of the two variances in a 
sample from a normal bivariate population,’’ Biometrika, vol. 30 (1938), pp. 
190-192. 

[25] D. N. Lawtey, ‘‘A generalization of Fisher’s z-test,’’? Biometrika, Vol. 30 (1938), pp. 
180-187. 

[26] P. L. Hsu, ‘‘Contribution to the theory of ‘Student’s’ t-test as applied to the problem 
of two samples,’’ Stat. Res. Mem., Vol. 2 (1938), pp. 1-24. 

(27) P. L. Hsu, “On the best unbiased quadratic estimate of the variance,’ Stat. Res. Mem., 

Vol. 2 (1938), pp. 91-104. 















MATHEMATICAL STATISTICS 73 


[28] P. L. Hsu, ‘“‘Notes on Hotelling’s generalized T,’’ Annals of Math. Stat., Vol. 9 (1938), 
pp. 231-244. 

[29] S. S. Wixks, ‘‘The large-sample distribution of the likelihood ratio for testing com- 
posite hypotheses,’’ Annals of Math. Stat., Vol. 9 (1938), pp. 60-63. 

[30] F. N. Davin and J. Neyman, “Extension of the Markoff theorem on least squares,”’ 
Stat. Res. Mem., Vol. 2 (1938), pp. 105-116. 

[31] D. J. Bisnop and U.S. Narr, ‘‘A note on certain methods of testing for the homogeneity 
of a set of estimated variances,’’ Jour. Roy. Stat. Soc., Supp., Vol. 6 (1939), 
pp. 89-99. 

[32] G. W. Brown, ‘‘On the power of the L; test for the equality of several variances,”’ 

Annals of Math. Stat., Vol. 10 (1939), pp. 119-128. 

[33] E. J. G. Prrman, “‘A note on normal correlation,’’ Biometrika, Vol. 31 (1939), pp. 9-12. 

[34] W. A. Moraan, “‘A test for the significance of the difference between the two variances 
in a sample from a normal bivariate population,’ Biometrika, Vol. 31 (1939). 
pp. 13-19. 

[35] D. J. Bishop, ‘‘On a comprehensive test for the homogeneity of variances and co- 
variances in multivariate problems,’’ Biometrika, Vol. 31 (1939), pp. 31-55. 

[36] E. J. G. Prrman, ‘“‘Tests of hypotheses concerning location and scale parameters,”’ 
Biometrika, Vol. 31 (1939), pp. 200-215. 

[37] P. O. Jounson and J. Neyman, ‘‘Tests of certain linear hypotheses and their applica- 
tion to some educational problems,’”’ Stat. Res. Mem., Vol. 1 (1939), pp. 57-93. 

[38] P. L. Hsu, ‘‘On generalized analysis of variance, I,’’ Biometrika, Vol. 31 (1940), pp. 
221-237. 

[39] P. C. Tana, ‘‘The power function of analysis of variance tests with tables and illustra- 
tions of their use,’’ Stat. Res. Mem., Vol. 2 (1938), pp. 126-149. 

[40] H. O. Hartiey, ‘“‘Testing the homogeneity of a set of variances,’ Biometrika, Vol. 31 
(1940), pp. 249-255. 

(41] J. F. Daty, ‘‘On the unbiased character of likelihood ratio tests for independence in 
normal systems,’’ Annals of Math. Stat., Vol. 11 (1940), pp. 1-32. 

[42] C. T. Hsu, ‘On samples from a normal bivariate population,’’ Annals of Math. Stat., 
Vol. 11 (1940), pp. 410-426. 

[43] J. Nerman, “‘A statistical problem arising in routine «nalysis and in sampling inspec- 
tions of mass production,’’ Annals of Math. Stat., Vol. 12 (1941), pp. 46-76. 

[44] A. Wap and R. J. Brooxner, ‘‘On the distribution of Wilks’ statistic for testing the 
independence of several groups of variates,’’ Annals of Math. Stat., Vol. 12 
(1941), pp. 137-152. 

[45] P. L. Hsu, ‘‘Canonical reduction of the general regression problem,’’ Annals of 
Eugenics, Vol. 11 (1941), pp. 42-46. 


Part IV. RanpomizaTION TESTS 


[46] E. J. G. Prrman, ‘‘Significance tests which may be applied to samples from any popula- 
tion, (I),’”’ Jour. Roy. Stat. Soc. Supp., Vol. 4 (1937), pp. 119-130. 

[47] E. J. G. Pirman, “‘Significance tests which may be applied to samples from any popula- 
tion, (II), The correlation coefficient test,’’ Jour. Roy. Stat. Soc. Supp., Vol. 4 
(1937), pp. 225-232. 

[48] E. J. G. Pirman, ‘‘Significance tests which may be applied to samples from any popula- 
tion, (III), The analysis of variance test,’’ Biometrika, Vol. 29 (1938), pp. 322-335. 

[49] B. L. Wetcu, ‘‘On the z-test in randomized blocks and Latin squares,’’ Biometrika, 
Vol. 29 (1937), pp. 26-52. 

[50] B. L. Wetcn, ‘‘On tests for homogeneity,’ Biometrika, Vol. 30 (1938), pp. 149-158. 

[51] E. S. Pearson, ‘“‘Some aspects of the problem of randomization,’’ Biometrika, Vol. 29 
(1938), pp. 53-64. 

Note: None of the 1941 Biometrika was received until after this paper had been read and 
prepared for publication. 











RECENT ADVANCES IN MATHEMATICAL STATISTICS, II! 


By Ceci C. Craic 
University of Michigan 





The statistical theory of the linear relationship between a dependent variable 




















21, and a set of independent variables x2 , 3, --- , X41, is by now quite gen- 
erally understood. Supposing that the z,’s are measured from their respective 
means, we determine the coefficients, bz, b3, --- , bi41, in such a way as to 


maximize the coefficient of correlation r;.2 3...141 between 2 and dX ba;. This 
coefficient of correlation, usually called the multiple correlation coefficient, 
measures the exactness of the linear relationship that exists, and it has the 
property of being quite unchanged if the origins or the scales for the separate 
z,’s are changed in any way or even if the set 22 , 73, --- , 141 Should be replaced 
by any equivalent set of linear combinations of them. That is, e.g., if ¢ = 3, 
the new variables, ve = 22 + 23 + 24, v3 = 201 — X3 + 324, V4 = X11 + 2x3 — 2axry 
are equivalent to 22 , X3 , X4 , since the latter can be found if the v,’s are known, 
and the multiple correlation between 2; and the v,’s is exactly the same as that 
between 2; and 22, 23, 2%. Moreover, the requisite sampling theory if the 
variables involved are normally distributed is well established. 

I want to discuss briefly an important generalization of this kind of situation 
that has been the subject of recent research. In particular, in his paper, “Rela- 
tions between two sets of variables,” published in Biometrika in 1936 [1] H. 
Hotelling set forth these ideas in excellent fashion and contributed much to the 
mathematical theory required for their practical application. We now suppose 
that we have two sets of measurements, 1, --- , Zs, ANd 1441, °** , Ys4e, Made 
on the same object and that we are interested in the linear relations that may 
exist between the members of one set and the members of the other. As an 
example, 2, --- , 2, might be the prices of s more or less related commodities 
at a given time, and 2,41, --- , 4, measures of factors which may be thought 
to be effective in the price situation. 

In the more special case I began with, s = 1, and a single equation fully ex- 
pressed the linear statistical relationship of 2; with a2, --- , 2:41. Now there 
are s dependent variables and now with s < t, not one but s distinct linear 
relations will exist and will be required to fully describe the linear connections 
between the two sets of variables. We may assume that there is no mere 
duplication among the variables we are using, i.e., no one of the s z,’s is always 
exactly given by a linear combination of the others in the set and the same is 

















1 This is the second of two papers read by B. H. Camp and the author on ‘‘Recent Ad- 
vances in Mathematical Statistics’’ before the American Statistical Association, the Econo- 
metric Society, and the Institute of Mathematical Statistics, on December 30, 1941, in 
New York City. The authors selected topics from papers published during the past five 
years. 


74 


MATHEMATICAL STATISTICS 75 


also true of the set %.41,-°-°*,2%s42¢. Now there is no logical or mathematical 
necessity for the way in which we are so far using our measurements. Suppose 
s = 2 and ¢t = 3. We can find the best linear regression equation for x; on 
23, %4, X» and then find the like equation for x2 on 23, 44, %3. But we could 
very possibly get more meaning out of the situation if we began by replacing 2 
and 22 by, say, Wu; = 2% + 2 and we = x — 22 and similarly replacing 23 , x4 , 25 
by three v,’s formed from these three z’s in a similar fashion. We have really 
been making a quite arbitrary choice among the wu’s and v’s that could be used 
and the question presents itself: What significance is there in the way we choose 
our u’s and v’s? 

It turns out to be much more than a merely reasonable beginning to try to 
determine a u from the first set and a v from the second in such a way that they 
will be more closely correlated than any other u and v formed in this linear 
fashion from the s 2’s in the first set and the ¢ z’sin the second. That is, we set, 


ett 


u= dagte and v= >, ba, 

a=l t=s+1 
and determine the a,’s and the b,’s which will maximize r,,. We may say that 
this u and v will account for more of the linear dependence of x, --- , 2, upon 
Ys41,°°* » Xe4¢ than will any other u and v. To the mathematician familiar 
things begin to appear, though, as Hotelling remarks, in its purely mathematical 
form this problem seems to be new. A very important observation is the fact 
that this maximum r,, would be quite unaffected by any change in origin or 
scale on any of the z’s; it is even unaffected if we should begin by replacing the 
first s x’s by any equivalent set of s linear combinations of them as new variables 
to work with and by doing the same thing on the second set of t’x’s. Hotelling 
makes use of this circumstance to greatly simplify his mathematical de- 
velopments. 

Now things fall out in a very interesting way. One actually solves not for the 
a’s and b’s at first but instead for the maximized r,,. Having this the corre- 
sponding a’s and b’s can then be found. But generally the equation for r,, 
gives not one but s different values for r.,! What is the meaning of the s 
different r.,’s? Well, you remember that I said that s relations (s < ¢) would 
appear to exist between the two sets of variables. These s r,,’s correspond to 
those s linear relations which are picked out in a unique way. We now have 
$ u,v pairs which are independent of each other in the sense that no u or v is 
correlated with any other u or v with the exception of the other member of its 
pair, and of course this correlation is precisely the r., by which the pair was 
determined. Further, the largest r., gives the maximum wu and v we set out 
to find; the second largest r,, determines the pair u, v of maximum correlation 
among those independent, in the sense just described, of the first pair; the third 
largest ru, leads to the u, v of maximum correlation among those independent of 
the first two pairs, and so on. The s independent linear relations among them 
completely describe the linear statistical dependence of the one set of variables 
upon the other. The relations are essentially those between the-u, v pairs and 















76 CECIL C. CRAIG 
the closeness of these are measured by 71, 72, --- ,7., which I write for the 
8 Tuy S. The new variables are called canonical variables and the correlations 
between them canonical correlations. We may say that the maximum pair, 
u, v, gives both the best linear predictor that can be formed from 2.41, «++ , e+e 
and also the linear combination of x; , --- , x, that can be best predicted. 

I have to try to deal briefly with the numerous ideas and results in this paper 
which is not unrelated to earlier work by the author and by S. S. Wilks. First, 
what about an over all measure of the linear connection between the two sets 
of variables? It is shown that 


q= +trr---r, and z= (1 —7})(1 —73)--- (1 — 7), 


have properties that make it appropriate to call the first the (vector) correlation 
coefficient between the two sets and the second the coefficient of alienation. 
Both are simply expressed by means of determinants of the covariances (product 
moments) among the st z’s. For example, if s = 1, q is simply ri.23...141. If 
s=t= 2, 


qa T13T24 — 114723 
V(1 — rh) — rh)’ 


the numerator of which is the tetrad difference of the psychologists. Further, 
if it should happen that x2 and 24 are identical, this gq becomes 73.2 . 

In an application, of course, the various quantities appearing above will have 
to be calculated from an observed set of values of 21, +--+ , 2s, Ue41,°°* » Lent. 
Hotelling adapts an iterative process he had previously given to calculating the 
canonical 7; , --- , 7, from which the canonical variables can be found, and he 
numerically illustrates the whole procedure. But what is more difficult is to 
solve the sampling problems that arise. It is very helpful to assume that all 
the z’s obey a multiple normal frequency law. 

First, Hotelling derives expressions for the standard errors of the r’s and of 
q and z which are approximations useful for large samples. But for small 
samples exact sampling distributions are needed. Wilks [2] had earlier studied 
the exact sampling distribution of z in the case in which we are interested, that 
in the population the set 2,---,2., is completely independent of the set 
Xe41, °** » e+e, though he did not leave his general result in a form suitable for 
calculation. Hotelling now finds the distribution function for q for s = 2. 
The result is not in all cases simple in form but numerical values can be obtained 
from it. The relations between these two possible tests, one based on z and the 
other based on q, are discussed at length. 

An obvious undertaking would be to try to find the exact joint sampling 
distribution of the canonical correlations for any s and ¢, and I will say some- 
thing about the very interesting papers in which this problem was solved. But 
some of this later work arose in a different though related setting which I want 
to discuss briefly first. 

In 1936 R. A. Fisher published ‘The use of multiple measurements in taxo- 
nomic problems,” [3] which was the introduction of linear discriminant functions 










;° ee es ee | =a we ww e 


og t Ve 


t 








MATHEMATICAL STATISTICS 77 





to the statistical world. Suppose that N; random individuals of one race 
(species, variety, etc.) have been measured with respect to each of k character- 
istics and that Ne random individuals of another race have been similarly 
measured. What linear combination of these measurements would serve best 
to distinguish members of one race from those of the other? An example used 
by Fisher in this paper was that of two samples of 50 plants each of two varieties 
of iris found growing together in the same colony. In the flower on each plant 
there was measured the sepal length, x, , the sepal width, 22 , the petal length, 
x3, and the petal width, z,. What linear function, 







X = Asi + Ace + Asts + Aga, 


would enable one to most surely identify the variety to which each single plant 
belongs? To choose such an X Fisher proposed the mathematical principle that 
the coefficients, \;, 7 = 1, 2, 3, 4, be determined so that the difference in the 
average value of X in the one variety and the average value in the other divided 
by the sum of squares of the X’s taken about the two group means shall be a 
maximum. Then quite simple mathematics leads to the required numerical 
values of the X,’s. 

But now that we have set up such an instrument as X, there is a more interest- 
ing use to which it can be put. Suppose that the question were to establish 
that the N, individuals from the one group and the N-; individuals from the other 
really belong to different races distinguishable with respect to the complex of 
characters we have chosen to measure in each. We are on the old question of 
racial likeness or unlikeness and obviously the word ‘‘race’’ may have a meaning 
broad enough to give this work of Fisher’s wide application indeed. Subject 
to the principle according to which the coefficients \; are determined from sample 
sets of measurements, X is the best possible linear discriminant function. We 
are now faced with the question of the statistical significance of the difference 
between the means of X for each group compared to the above mentioned in- 
ternal sum of squares. 

It is generally useful and enlightening in a problem of this general nature 
turning on the use of linear and quadratic forms to consider its interpretation 
as an analysis of variance or covariance. Fisher readily provides such a set-up 
in this case by assigning to the quality of belonging to race A a numerical value, 
yi, the same for all members of that race, and by assigning in like fashion a 
different numerical value, y2, to the quality of belonging to race B. It is 
mathematically convenient if we have samples of N; and N2 from races A and B 
respectively, to let 


"1 = Nand y= a. 

1 Ni + Ne : Ni + N2’ 

for then over the combined sample of Ni + N2, we have, 
Ni Ne 


S(y) =0 and Sy’) = Not 














78 CECIL C. CRAIG 


This may seem somewhat arbitrary at first glance, but let us start anew by 
writing the linear regression equation, 


k 
y= 2d b(xs — %), 


in which y takes on one of the two values above and in which Z; is the mean of 
x; in the combined sample, and then proceeding to determine the b,’s in the usual 
least squares fashion. The b,’s turn out to be proportional to the A,’s previously 
found. Now the total variance of the y’s is analyzed into that within groups 
and that between groups and it is immediately suggested that the usual z-test 
with k and N — k — 1 degrees of freedom is the appropriate one. But, as 
Fisher remarks, ordinarily for the application of this test one postulates a popu- 
lation in which the y’s have a normal distribution for each fixed set of values of 
%1, 2,°**,2%,. Here, however, the y remains fixed and one postulates a 
normal distribution of the z’s associated with a given value of y. Not to leave 
this matter in doubt, though I shall return to it, I may remark that Fisher 
noted that earlier work by Hotelling [4] showed that the z-test is nevertheless 
the proper one to. use. 

I have to be brief indeed concerning linear discriminant functions. Fisher 
wrote further papers dealing with them in 1938 [5], 1939 [6], and 1940 [7] and 
among others, Mahalanobis [8], Bose [9, 10], and Roy [10], of the “Calcutta 
School” have made relevant contributions. In particular, Mahalanobis [8] 
introduced the concept of the generalized distance by which two sets of multiple 
measurements differ, which has an obvious connection with the present subject. 
Fisher also discussed a test for the direction in k-space in which two such samples 
differ most and in case we have three such samples from three different races 
provided a test for their collinearity. 

In his 1939 paper mentioned above [6], Fisher called attention to the connec- 
tions between the theory of linear discriminant functions and Hotelling’s ca- 
nonical correlations. Of course it can be said at once that a linear discriminant 
function arises as the very special case of investigating the linear relationship 
between the artificially introduced y and 2 , x2, --- ,2,. And the test of sig- 
nificance based on the analysis of variance turns on the ratio of the sum of 
squares due to regression, i.e., among the predicted values, to the total sum of 
squares for the regression and for the residuals. This analysis is quite general 
in form and can equally well be set up if one is predicting linear forms formed 
from N, variables from linear forms made up from N2 other variables. If one 
sets up the condition that this ratio, #, be a maximum one is led, as Fisher shows, 
to a determinantal equation in 3, the roots of which are the squares of Hotel- 
ling’s canonical correlations. 

Mathematically the general problem we are interested in is equivalent to the 
following: We have a sample of N; + N2 observed values of p normally dis- 
tributed variables. If a;; is the covariance of the 7-th and j-th variables in the 
sample of N; and b;; the like covariance in the sample of N2 we want the sampling 
distribution of the roots of the determinantal equation: 












MATHEMATICAL STATISTICS 


| aiz — Pai; + B;) | = 0, 


under the hypothesis that the first sample is independent of the second. This 
problem Fisher solved in his 1939 paper though in his characteristically concise 
and intuitive manner. But in the same number of the Annals of Eugenics, 
P. L. Hsu [11], at Fisher’s suggestion, gave a complete analytical solution. Hsu 
also showed more in detail how the result applies to Hotelling’s case of N ob- 
servations on s + ¢ normally distributed variables in which the set of s is inde- 
pendent of the second set of ¢. In his 1936 paper Hotelling gave the result for 
s = ¢ = 2 and in 1939, Girschick [12] gave the solution for s = 2 and ¢t > 2. 
Hsu showed, too, the striking fact, mentioned by Fisher, that it is sufficient 
that only one of the two sets of s and of ¢ variables be normally distributed in 
order that the distribution function found apply. This provides the explanation 
of why the test of significance applied by Fisher for linear discriminant functions 
is valid even though the y introduced had an arbitrary distribution of values. 

The simultaneous distribution of the canonical correlations is fundamental 
but on finding it not all difficulties are thereby resolved. As mentioned above, 
either of the quantities, z or g, as they appear in Hotelling’s paper, furnish over 
all tests, or rather they would if their distribution functions were obtained in a 
satisfactory form. The form of the distribution of z for complete independence 
was given by Wilks as early as 1932 [2] but that of q for s > 2 is still lacking. 
For s_>.2 there are difficulties in applications even with z and in 1938 [13] 
M.S. Bartlett proposed a more convenient approximate test. Ordinarily, how- 
ever, one would want to test the largest canonical correlation alone for signifi- 
cance. There are two kinds of trouble here. First, there is no assurance that 
the largest observed canonical correlation corresponds to the largest one in the 
population. Second, it is quite important to know whether the remaining popu- 
lation correlations are zero or not. Bartlett in 1941 [14] discussed these points. 

Now I make an abrupt change in subject. Some interesting work has been 
done on the theory of runs and its applications during the last five years. 

First, I want to try to convey some idea of the contents of three papers by 
W. D. Kermack and A. G. McKendrick published in 1937 [15, 16] and 1938 [17]. 
Suppose we have an unlimited set of numbers, no two of which are equal, and 
start drawing from them at random, recording the numbers in sequence as they 
come. Within the sequence drawn there will occur runs up and runs down of 
varying lengths. Thus in the sequence of 10 numbers, 2, 5, 11, 8, 9, 4, 3, 7, 14, 
12, there are 3 runs up, one of length 2 and 2 of length 3, and 3 runs down, 
2 of length 2 and one of length 3. Both ends of a run are counted in finding its 
length; no run can have a length less than 2. The total number of runs is 6 of 
which 3 are of length 2 and 3 are of length 3. We can also count the gaps which 
extend from crest to crest or from trough to trough and note their lengths with 
the convention that again both ends are counted in determining a length, so 
that no gap length is less than 3. Thus in the sequence of 10 numbers above 
there is one gap of length 3, 3 of length 4, and one of length 5. 

It is clear that if we know the distribution for runs or for gaps of different 













































































































































80 CECIL C. CRAIG 


lengths we can compare an observed sequence, or rather an observed distribution 
of runs or gaps by lengths, with the frequencies calculated on the hypothesis 
of randomness and be by way of acquiring a test for this hypothesis. To be 
brief, in these papers these theoretical distributions are found together with 
their means and variances. There are some interesting applications. Tippett’s 
random sampling numbers and a series of reversed telephone numbers both 
passed the x’-test as random and also passed the test based on the departure of 
the mean from its expected value compared with its standard deviation. On 
the other hand, the series of Swedish death rates for the period 1740-1930 
could not conceivably be random. This investigation was prompted in the first 
place by the fluctuations of the death rate from ectromelia in mice in an experi- 
mentally induced epidemic. 

The problems here dealt with had been only partially solved by earlier writers. 
There is much interesting material in these papers I have no space for. The 
authors readily include the case in which the numbers composing the population 
are not all different. They also studied series of limited length, series arranged 
in a cycle or ring and even what may be termed a Mobius cycle. 

A. M. Mood in 1940 [18] in an interesting paper investigated a different form 
of the problem of runs. Suppose we have n elements of two kinds, say 7; a’s 
and m2 = n — n,b’s, and that these are arranged at random in a row. For 
example, if nm, = 5 and n, = 7, and if a.random arrangement of the 12 a’s and b’s 
is babbbabbaaab, the a’s occur in 2 runs of one and in one run of 3 and the b’s 
come in 2 runs of one, in one run of 2 and in one run of 3. If ri; (¢ = 1, 2) is 
the number of runs of j of elements of variety 7, Mood finds the probability of 
obtaining a given set of values of 7r;; such that >» jrsg = 1; (¢ = 1, 2), ie., of 


7 

obtaining a given pattern of runs in the two kinds of objects. Besides this 
basic distribution function he obtains certain marginal distributions such as 
that for the occurrence of a given set of runs in the a’s regardless of how the b’s 
fall (except that they must provide the necessary points of division), or that for 
r, and r2 if these are respectively the total number of runs of a’s and of b’s, or 
that for 7; or re alone. He finds the factorial moments ‘of these variables and 
then their means, variances and covariances. Similar results are obtained in 
case there are more than two kinds of elements. In the second part of the 
paper, Mood turns to the case of drawings from an infinite population in which 
articles of two or more kinds occur in-fixed proportions. Finally, in both of the 
two kinds of drawings considered he derives the limiting forms of the distribu- 
tions studied as the sample size increases. As Mood notes, here, too, a few of 
the results had previously been found, but this paper is the first really thorough- 
going investigation of its subject. 

In a paper antedating Mood’s by some six months, A. Wald and J. Wolfo- 
witz [19] used the distribution function for the total number of runs (irrespective 
of length) for arrangements of fixed numbers of two kinds of elements to provide 
a test of the hypothesis that two samples have come from the same population 
with a continuous distribution law. If the observations in the two samples 





MATHEMATICAL STATISTICS 81 


combined are arranged in order of magnitude and if then the observations from 
the first sample are each replaced by a zero and those from the second are each 
replaced by a one, we have a situation to which this distribution function for 
runs applies. W. L. Stevens in 1939 [20] also discussed an application of this 
distribution. 

The third principal topic I have chosen for my remarks is developments in 
the use of the probability integral transformation. The use of this device at all 
seems to be quite recent, appearing in a paper by H. Cramer in 1928 [21] who 
invented a test of goodness of fit which reappeared as the “‘w’-test” in apparently 
independent work of R. von Mises in 1931 [22]. In 1932 in a section new in the 
fourth edition of “Statistical Methods for Research Workers,” [23] Fisher showed 
the usefulness of this transformation in combining independent tests of signifi- 
cance and in 1933 and 1934 Karl Pearson [24, 25] had papers in Biometrika on 
the subject. 

As for the transformation itself, suppose that p(x) is the probability density 
function of a continuous variable x defined on the range (a, b) such that, 


[ p(z) dx = 1. 


Then let us introduce the variable, 


y= . p(x) dz, 


which is the probability that a value of the variable at random will be less than z. 
It will be seen that since z is a random variable, the proportion of pcpulation 
values less than an x drawn at random is itself a random variable. Perhaps 
this will be clearer if I use a simple example of J. Neyman’s to show how a 
sample of x’s also determines a sample of y’s for a given p(x). Suppose that, 
as 

p(x) = /2n° , 
and that a sample of 5 values of x arranged in order of magnitude is: —1.5, 
—1.1, —0.5, 0.6, 1.6. Then by reference to a table of areas under the normal 
curve of error, we find that the corresponding observed y’s are: 0.067, 0.136, 
0.309, 0.726, 0.945. It is obvious that the range for y is always, for any z(z), 
(0, 1). Further if f(y) is the probability density function for y, of course, 


Sy) dy = p(x) dz. 
But from the definition of y, 
dy = p(x) dz, 


so that f(y) = 1. Thus, quite independently of p(x), y obeys a rectangular 
distribution law on the range (0, 1). 


This simplicity of the distribution of the quantity y and its independence of 





82 CECIL C. CRAIG 


p(x) are most attractive properties. I shall note briefly some of the applications 
that have been made in recent years. 

In 1936 W. R. Thompson [26] denoted by p; the probability that in a sample 
of N a randomly chosen z will be less than 2; , the k-th value observed. Then 
the probability that p’ < p, S p’’ isjust p’’ — p’. The probability that exactly 
r other members of the sample will be less than 2; is then, 


("> "pia - par 


Further for all samples in which just r values occur less than 2; , the proportion 
of occasions on which p’ S p, S p’” is given by 


—- 
[va py ap / ar +1, N - >», 
p’ 


the difference of two incomplete 6-functions. But that there are exactly r ob- 
served x’s less than 2; is equivalent to saying that 2; is the (r + 1)-st observa- 
tion in order of magnitude, so that in the above we may as well replace r by 


k—1. Itis easy to find that the expected value of p; in such samples is 


k(N —k +1) 
(N + 1)?(N + 2)” 
expressions that the proportion of occasions on which 4% < x < Zw_x41 is 
ea 4S 2k (N +1 > 2k). Statements of this kind establish confidence 
limits. Thus if one says that in a sample of N, an observation at random will 
fall between the k-th and the (VN — k + 1)-st observations in order of magnitude, 
N +1 — 2X 
N+1 

integral just above is the fiducial probability of the truth of p’ S p. S p’”’ if 
in a sample of N the k-th observation is the (r + 1)-st in order of magnitude. 
Thompson went on to obtain confidence limits for the median in a sample of N 
from any population. 

In 1939 Wald and Wolfowitz [27] studied the problem of obtaining confidence 
limits for g(x), the proportion of observations in a sample of N with values 
less than a given zx, the population obeying any continuous distribution law. 
Their arguments are too complicated to attempt to sketch them here, but they 
are based on the fact that the transformed variable, y, as defined above, is 
rectangularly distributed on the interval (0, 1). With their exact solution they 
gave a more convenient approximate method for calculation in applications. 

In 1938 (I am not being strictly chronological) E. S. Pearson [28] published 
a study of test criteria based on this probability integral transformation. Sup- 
pose that we have n independently observed y’s, y1, y2, °°: ,Yn- How should 
the y’s be used to test the hypothesis that the observations from which the y’s 
were calculated all came from the same population? K. Pearson [24] had 


N+1 


and that the variance is It follows from the first of these two 


such a statement has a probability of of being true. Or, the 





MATHEMATICAL STATISTICS 83 


already suggested the use of Q = yy2--- yn or Q’ = (1 — y:)(1 — y) --> 
(1 — yn). It is known that a simple function of Q or of Q’ obeys a x’-distri- 
bution with 2n degrees of freedom so that we have a ready means of combining 
independent tests based on Q or Q’. But how is one to choose among Q, Q’, or 
other functions of the y’s that might be suggested? E.S. Pearson emphasized 
the role that the hypotheses conceived as alternate to the one being tested 
should play in making such a choice. He illustrates this in a case of testing 
the hypothesis that a sample came from a normal population of zero mean and 
unit variance and in which the alternate populations, from one of which the 
sample might have been drawn, are such that the corresponding y’s calculated 
on the hypothesis being tested would follow a Pearson type I distribution law. 
Using the likelihood principle he was led in this case to Q or Q’, which are then 
concluded to be “‘best possible tests.’’ 

The final paper I want to discuss is an important one by J. Neyman on the 
“Smooth test of goodness of fit,”” published in 1937 [29]. Suppose again that a 
random sample of N values of x gives the set, 41 , ye, -** yw on the hypothesis 
H, that the population distribution law is p(z| Ho). If Ho is true the y’s in 
random samples do follow a rectangular distribution on (0,1). But what would 
be the distribution of the y’s if the distribution law for the population were 
actually p(z|H:)? We have for the y’s as calculated, 


_ [ pe) He) ae. 
But to find f(y), 


f(y) dy = p(x | Hi) dz, 
so that, 


_ p(x | Ai) 
fw) = pie) ™ * 
Therefore if Ho is not true, the y’s calculated on the assumption that it is may 
be expected to exhibit a statistically significant set of deviations from a rec- 
tangular distribution. 

As Neymian remarks, it is a defect of the x’-test of goodness of fit that the 
information one has of the algebraic signs of the differences between calculated 
and observed frequencies, particularly of the way in which positive and negative 
differences succeed each other, is completely unused. And in forming a test of a 
statistical hypothesis it is now well understood, thanks to Neyman and Pearson, 
that due account should be taken of the alternate hypotheses conceivably true. 

Neyman begins by specifying a wide class of alternate hypotheses in a form 
that lends itself to mathematical treatment. This is done by assuming that the 
distribution of y’s calculated for Hp will, if an alternate H, is true, be given by a 
function of the form, 


E exc w) 
ply |6:, 02, -**, 0) = ce? 
















84 CECIL C. CRAIG 


in which x;(y) is a polynomial of degree 7 (a transformed Legendre polynomial) 
with convenient properties. For low values of k, such as will ordinarily be used, 
this permits alternate distribution curves to deviate in a smooth manner from 
the distribution tested, with a limited number of intersections with it. 

Now the problem is to determine the function of the observed y’s which will 
provide a suitable test of Hy with respect to the alternate hypotheses of order or 
class k, k having been decided upon in advance of making the test. The mathe- 
matics, proceeding along Neyman and Pearson lines, shows that the appropriate 


function, for large samples at least, is simply > uj in which, 
1 


1 JZ ’ 
uw = ./N > ai(ys) 
the y;’s being calculated from the sample. Moreover, the probability that the 
k 


sum ), u? exceeds a given value is at once obtained from a table of theincom- 
1 


plete I'-function, i.e., this sum is proportional to a x’. 

This is a very fine piece of work but, as Neyman points out, there are still 
questions to be settled concerning the general utility of this “smooth test.” 
F. N. David in 1939 [30] further discussed this test. In particular, it may be 
pointed out that the parameters in p(z|Ho) must be assumed known; what 
would be the effect on the test of estimating these parameters is unknown. A 
reasonably large sample seems to be required to make the developments on the 
assumption of large samples applicable but a y must be calculated for each 
observation. This makes for a good deal of computing but it is not known how 
grouping of observations might be effected. And the matter of the choice of the 
order of the test to be applied, i.e., of a value of k, is still somewhat in doubt. 

I will not debate the proposition that there are papers completely omitted 
from this discussion as important as those I have included however inadequately. 
The limitations of space forced me to choose and it is quite possible that my 
personal tastes and interests had more weight than they should. 


REFERENCES 


(1] H. Hore.uina, “Relations between two sets of variates,’ Biometrika, 28 (1936) , 328-377. 

(2] 8. S. Wixixs, ‘‘Certain generalizations in the analysis of variance,’’ Biometrika, 24 
(1932), 471-494. . 

[3] R. A. FisHer, ‘‘The use of multiple measurements in taxonomic problems,’ Annals 
of Eugenics, 7 (1936), 179-188. 

[4] H. Horexurne, ‘‘The generalization of Student’s ratio,’’ Annals of Math. Stat., 2 
(1931), 360-378. 

(5) R. A. Fisuer, “The statistical utilization of multiple measurements,’ Annals of 
Eugenics, 8 (1938), 376-386. 

(6) R. A. Fisuer, ‘The sampling distribution of some statistics obtained from non-linear 
equations,’ Annals of Eugenics, 9 (1939), 238-249. 

[7] R. A. Fisner, ‘“‘The precision of discriminant functions,”’ Annals of Eugenics, 10 

(1940), 422-429. 








MATHEMATICAL STATISTICS 85 


[8] P. C. Manatanosis, ‘On the generalized distance in statistics,’’ Proc. Nat. Inst. 
Sci. Ind., 12 (1936), 49-55. 

[9] R. C. Boss, ‘‘On the exact distribution of the D? statistic,’’ Sankhya, 2 (1936), 143-154. 

[10] R. C. Bose anp S. N. Roy, ‘‘The exact distribution of the Studentized D? statistic,’ 
Sankhya, 3 (1938), part 4. 

{11] P. L. Hsu, ‘“‘On the distribution of roots of certain determinantal equations,’’ Annals 
of Eugenics, 9 (1939), 250-258. 

[12] M. A. Grrscuick, “‘On the sampling theory of the roots of determinantal equations,”’ 
Annals of Math. Stat., 10 (1939), 203-224. , 

{13] M. S. Bartert, ‘‘Further aspects of the theory of multiple regression,’’ Proc. Camb. 
Phil. Soc., 34 (1938), 33-40. 

[14] M. S. Barrett, ‘‘The statistical significance of canonical correlations,’’ Biometrika, 
32 (1941), 29-37. 

[15] W. D. Kermack anp A. G. McKeEnpnrick, ‘‘Tests for randomness in a series of numeri- 
ical observations,’’ Roy. Soc. Edin. Proc., 57 (1937), 228-240. 

[16] W. D. Kermack anp A. G. McKenprick, ‘‘Some distributions associated with a 
randomly arranged set of numbers,’’ Jbid., 332-376. 

[17] W. D. Kermack anp A. G. McKenprick, ‘Some properties of points arranged at 
random on a Mobius surface,’’ Mathematical Gazette, 22 (1938), 66-72. 

[18] A. M. Moon, “The distribution theory of runs,’’ Annals of Math. Stat., 11 (1940), 
367-392. 

{19] A. WaLp anno J. Wotrow!7Tz, ‘“‘On a test whether two samples are from the same popu- 
lation,’’ Annals of Math. Stat., 11 (1940), 147-162. 

[20] W. L. Stevens, “Distribution of groups in a sequence of alternatives,’’ Annals of 
Eugenics, 9 (1939), 10-17. 

[21] H. Cramer, ‘‘On the composition of elementary errors, Second paper, Statistical 
applications,’’ Skandinavisk Aktuarietidskrift, 11 (1928), 141-180. 

[22] R. von Miszs, “‘Vorlesungen aus dem Gebiete der angewandten Matematik,”’ Bd. 1: 
Wahrscheinlichkeitsrechnung, (Leipzig, 1931), 316-335. 

[23] R. A. Fisuer, Statistical Methods for Research Workers, (Edinburgh, 4th edition, 1932), 
Article 21.1. 

[24] K. Pearson, “On a method of determining whether a sample of size n supposed to have 
been drawn from a parent population having a known probability integral has 
probably been drawn at random,”’ Biometrika, 25 (1933), 379-410. 

[25] K. Pearson, ‘‘On a new method of determining ‘goodness of fit,’’’ Biometrika, 26 
(1934), 425-442. 

[26] W. R. Tuompson, ‘‘On confidence ranges for the median and other expectation dis- 
tributions for populations of unknown distribution form,’’ Annals of Math. 
Stat., 7 (1936), 122-128. 

[27] A. WaLp anp J. Wo.rowi17z, ‘“‘Confidence limits for continuous distribution functions,” 
Annals of Math. Stat., 10 (1939), 105-118. 

[28] E. S. Pearson, ‘‘The probability integral transformation for testing goodness of fit 
and combining independent tests of significance.’’ Biometrika, 30 (1938), 134-148. 

[29] J. Neyman, “‘A smooth test of goodness of fit,’’ Skandinavisk Aktuarietidskrift, 20 
(1937), 149-199. 

[30] F. N. Davin, “On Neyman’s ‘smooth’ test of goodness of fit. I. Distribution of the 
criterion ¥? when the hypothesis tested is true,’’ Biometrika, 31 (1939), 191-199. 






























NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


RR a 


A FURTHER REMARK CONCERNING THE DISTRIBUTION OF THE 
RATIO OF THE MEAN SQUARE SUCCESSIVE DIFFERENCE 
TO THE VARIANCE’ 





By JOHN vON NEUMANN 
Institute for Advanced Study’ 


1. Introduction. In our previous paper’ it was found convenient to assume 
that the number m (of the variables of the quadratic form under consideration) 
iseven. (Cf. p. 383, loc. cit.) This means that in the application to the mean 
square successive difference n = m + 1 must be odd. (Cf. p. 389, id.) 

In this note we shall show that the distribution for an odd m (i.e. an even n) 
can be expressed by means of the distribution for an even m—the latter being 
already known, loc. cit. 


Specifically, consider the distribution of y = > a,t,, if the 21, --- ,2m are 
p=1 
equidistributed over the surface >) z2 = 1. Denote the m-uplet (a1, --- , dm) 


p=l 
by A, then the distribution function of y depends on A; denote that distribution 
by wa(y). (Cf. p. 372 id., we write a, for the B, there.) 

Now consider an m-uplet A = (a,,.°-- ,@m) and a p-uplet B = (b1, --- , bp) 
and form the m + p-uplet C = (a1, ---,@m,b:,-°°:,6p). WriteC = A+B. 
Then we shall show that there exists a simple expression for we(y) in terms of 
wa(y) and wa(y). 

For the specific application to the mean square successive difference, we can 
putn = m+1,A = (cos (mp/n) forw = 1,---,3n—1,3n4+1,---,n— 1), 
B = (0),C = A+B = (cos mp/n for »p = 1, --- ,n — 1). 





2. The recursion formula. We proceed as follows. wa(y) can also be used 
to express the joint statistics of 


™m™ ™m™ 

2 2 

v = Lut, and p= 2%, 
= = 


or better, the volume of that part of the 21, --- , Zm-space which corresponds 
to any given domain in the y, p-plane. Thus the volume corresponding to a 















1Cf. the paper by the same author, Annals of Math. Stat., vol. 12(1941), pp. 367-395. 
2 Also Scientific Advisory Committee of the Ballistic Research Laboratory, Aberdeen 
Proving Ground. 


86 


RATIO OF MEAN SQUARE SUCCESSIVE DIFFERENCE TO VARIANCE 87 


given infinitesimal y, p domain dy dp will clearly be 
- d 
cota) 


where C,, is the (m — 1-dimensional) area of the 21, --- , tm-surface >, x, = |, 
p=l 
(the unit sphere). I.e., this volume is 


(1) 4CmWa (7) «pin dy dp. 


Similarly for 


= DL but and  w. 


v=l 


the volume corresponding to the infinitesimal ¢, ¢ domain df de is 
(2) $C pws (‘) -o'? * dt do. 
oC 


m Pp m p 
Finally for @ = y+¢ = Dax + > bu; and 7 =ptoa = Vei+ dw 
p= v=l 


p=l v=l 
the volume cosresponding to the infinitesimal 6, t domain dé dr is 


(3) 3C mp Wa4B (2).2e+0-* do dr. 
7 


Now 6= 7+ ¢,7 = p+ connect (1), (2), (3) as follows: 


6 +o) 
0. ctinats (*)- +p)—2 
: el 6 — a 
= [ dp-| dy -3C mwa (7) p *40ya»(2—7) (r — p)'? 
0 p i 


This gives (either by simply putting + = 1, or else by replacing 6, y, p by 78, 
TY, TP) 


9-— et a 
wa+n(6) i a | ‘dp: f dy -wa (“Jes (=) p “(1 eid p)*” : 
m+p 


To determine ee Cp apply to this | d@---. Then 
m+p 


erry , -et 
.* se [ do-p™ (1 — 2) 


os Ca C's CaC, TmIr[3p) 


i = cial apiece 
= 9¢,,, Blam, 47] = a0” TiaGn + pl 


Accordingly: 


sn FAS Lf ree) a 





88 E. F. BECKENBACH 


3. The special case. Let us now return to the special case mentioned at the 
end of 1—the application to the mean square successive difference. 

There p = 1 and B = (0), so that the “distribution” of ¢ is concentrated at 
the point 0. Hence w,(f) is an “improper’’ distribution, concentrated in the 
same way.’ Using C and A as described at the end of 1, the above formula 
becomes (now m = n — 2, p = 1) 


r[3(n — 1)] [ (°). in) yt 
(II) wa+c(8) = Tam — DIrGl dp-wa (1 — py”. 
It would have been equally easy, of course, to establish (II) directly. 
Putting p = 1/t gives 
r3(n — 1)] 
T'3(n — 2))P [3] 


Since w.(y) vanishes for |y| > cos (x/n), we may replace this integral 


eo cos (r/n)/|8| 
fy J 
1 1 


Formula (III) can be used for numerical work, and also to extend the formula 
(3) on p. 391, loc. cit., to even values of n. 


(IIT) wa+c)(8) = 


/ dt-w,(6t)t*"(¢ — 1). 
l 


CONVEXITY PROPERTIES OF GENERALIZED MEAN VALUE 
FUNCTIONS 


By E. F. BEcKENBACH 


University of Michigan 


In an article appearing in the Annals of Mathematical Statistics’ it was pointed 
out that while the mean value functions appearing below have been studied 
and used since 1840, there appeared to have been no attempt made to investi- 
gate the behavior of their second derivatives. 

Consider (1) the unit weight or simple sample form 


— (Ate +apte--+ —— 


n 


in which the xz; are positive numbers and in which ¢ may take any real value; 
(2) the weighted sample form 


w(t) = (aut ant nat" 
CO t+at ees +e , 


3 Dirac’s famous “delta function.’”’ It could be described by a Stieltjes integral. 
1 Nilan Norris, ‘‘Convexity properties of generalized mean value functions,’”’ Annals of 
Math. Stat., Vol. 8 (1937), pp. 118-120. 








MEAN VALUE FUNCTIONS 89 


in which the c; are positive numbers, and in which the z; and ¢ are restricted as 
in g(t); and (3) the integral form 


a(t) = (- - = [ ’ [f(z)]" ax) 


in which f(x) is a positive continuous function for x} S x S 2. 

Since the analysis and results are essentially the same in all three cases, we 
restrict our attention to 6(t). 

As is well known,” 6(¢) is a monotone non-decreasing function which varies 
from the minimum of f(z) to the maximum of f(z) as ¢ increases from — © to 
+o. It is further of some importance to study the rate at which the rate of 
increase of this type bias is changing as ¢ increases; the rate in question is given 
by the second derivative 6’’(é). 

The following points were made by Norris, loc. cit.: (1) Since, as we have 
pointed out, 6(t) has two horizontal asymptotes, 6(¢) must have at least one 
inflection point. (2) Consideration of a simple example shows that there is not 
necessarily an inflection point at t = 0; 6’’(0) can be made to take on any real 
value. 

Thus it is not true that 6’’(¢) must be positive for all ¢ < 0 and negative for 
allt > 0. On the other hand, we shall give simple bounds for 6@’’(#) in the other 
direction; namely, we shall give a positive upper bound of 6’’(t) fort < O and a 
lower bound for ¢ > 0. These bounds are precise in the sense that they are 
actually taken on in the special case f(x) = const. Their main advantage lies 
in the fact that while the expression for 6’’(t) is quite involved, these bounds are 
simple expressions in the quantities 6(¢) and 6’(¢) which might already have 
been computed. 

Let 





A(t) = log A(t). 
Differentiating, we obtain 


: ” Lp(a)l log (f(x) dx . 
Pv =? a = [f(s log C00 — log ( : [ ife)I'ae). 








[ * [f@)I' dz wo os 
21 
It follows’ that 
W(t) 2 0 
and 
6’(t) = 0. 
Let 


p(t) = Pr’ (2). 


?See for instance, G. Pélya und G. Szegé, Aufgaben und Lehrsdtze aus der Analysis 
(Berlin, 1925), Vol. 1, pp. 54-55 and 210-211. 
See G. Pélya und G. Szegé, loc. cit., p. 210. 








90 E. F. BECKENBACH 














Curiously, while \’’(¢) and @’’(t) appear to be rather formidable, the closely 
related quantity u’(t) is rhade relatively simple by the fact that two of the 
terms obtained by formal differentiation are negatives of each other; and 
Schwarz’ inequality can be applied to the remaining terms, as follows. 

We obtain 


([" ventar) wo = ¢{ ([" uentae) ([” Wer tog (2)! ae) 
a ( [ : [p(x)I* log f(x) az) | 


By Schwarz’ inequality,’ it follows that 
u(t) = tr(t), 
with 
x(t) 2 0, 


the sign of equality holding if and only if f(z) 
From the definition of u(t) we obtain 


const. 





u(t) = ¢{2r’{t) + A’(O)] = a Ex + 10'"(t) — Se | 


whence 












iz 


2n'(t) + A(t) = a0 





Ee + 10'"(t) — vor | = x(t) = 0; 


that is, 


“ , ” let)? oy 
a(t) = —2r0(t), w(t) = — 26'(t). 


It follows that for t < 0, we have 
N(t) S —2n(t)/t 
and 


1 [e(t) — 20"(t), 
et) s at) 86httl 





while for ¢ > 0, we have 


N’(t) 2 —2d’(t)/t 





and 


et) = POF _ 28° 


6(t) t 








4See G. Pélya und G. Szegé, loc. cit., p. 54. 







CHARACTERIZATION OF NORMAL DISTRIBUTION 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 


By EvuGEene LuxKacs 
Baltimore, Md. 


1. In sampling from a normal population the distributions of the mean and 
of the variance are mutually independent. This well known property of the 
normal distribution is used in deriving the distribution of ‘‘Student’s”’ ratio. 
The independence of the distributions of the mean and of the variance charac- 
terizes the normal distribution. To show this one has to prove the following 
statement: 

A necessary and sufficient condition for the normality of the parent distribution 
is that the sampling distributions of the mean and of the variance be independent. 

That this condition is necessary follows from the above mentioned property 
of the normal distribution; so there is only to prove that this condition is suffi- 
cient. This was first proved by R. C. Geary’ by using some of R. A. Fisher’s 
general formulae for the seminvariants. However, a different proof, using 
characteristic functions might be of some interest. 


2. Let f(x) be the density function of a continuous probability distribution 
and let 2, %2,°::,2n be n observations of the variate x. Denote by 


i= 7 ZLa/n the sample mean, and by 
r=] 


n—1 n—1l 


s = S (te — )'/n = [(n — 1) > a 22 Do tatesal/n’ 


the sample variance of these observations. The characteristic function of the 
distribution is then given by 


(1) y(t) = | e** f(x) dz. 


The characteristic function of the joint distribution of the statistics Z and s* 
is known to be 


(2) g(t, te) = | eee / efttetitae? 67.) eee flan) dz, +++ dXn. 
In the same way one obtains the characteristic function of the mean Z as 


(2a) ealts) = (tr, 0) = f +++ f ef a) +++ flan) de +++ den, 


1R. C. Geary, ‘‘Distribution of Student’s ratio for nonnormal samples,’’ Roy. Stat. 
Soc. Jour., Supp. Vol. 3, no. 2. 








92 EUGENE LUKACS 


and the characteristic function of the distribution of the variance s° 
(2b) gate) = (0, te) = / ree [ pa) +++ f(an) day +++ dtn. 


The independence of the distributions of and s’ means in terms of the char- 
acteristic functions g(t; , t) = ¢1(ti)¢e(te), or 








ar 


te |tg=0 


















Substituting in (2a) = >> x. / n, it is seen 
1 


eile) = TT f ongtes) dee =| f pte) ae | = twtu/nrr 


therefore 





(3’) 


ate - 


= fv(e,/ny " 


Differentiating (2) with respect to & 


ovis fh) _ if. fs 2 eftietitas? #7.) ol f (xn) dx, sus dn. 





Substituting s? = [(n — 1) 0 22 — 20D zatesil/n? and @ = >> 2,/n we 
1 


obtain easily 


do(t ’ 7) 
Ole 





_- Du nv f ate! f(a) de 


— wu/mr| fee" 4¢2) ae] 


cal in — 1) 3 


to=0 nN 


(4a) 


In a similar way it is seen 














Here o° denotes the population variance of the parent distribution. Substi- 
tuting (4a) and (4b) in the relation (3’) and writing ¢ = ¢,/n one has 


2 
(5) wo [2 *e'* f(x) dx — | xe” f(x) az | = [y(t)]}*o’. 
Considering the definition (1) of the characteristic function it is seen that 


(6) : = tf e** f(x) dz. 














CHARACTERIZATION OF NORMAL DISTRIBUTION 93 


The integrals on the left side of relation (5) are of this form. So one may write 
the relation expressing statistical independence of the sample mean and the 
sample variance as a differential equation for the characteristic function 
v(t), namely 


2 2 
(7) un Et + (HY = vor. 


The initial conditions to be satisfied are 


where y» is the population mean of the parent distribution. Integrating this 
equation it is seen that the characteristic function is 


(8) vit) = Mee", 


which is the characteristic function of the normal distribution. 


3. This reasoning applies also to the multivariate case. Let f(x: , x2, +--+ , 2p) 
be the density of the p variates 21, t2,--:,2%». Denote by tia (k = 1, 
2,--:,p; a = 1, 2,---,m) the a-th observation on the k-th variate, by z, 
the sample mean of this variate and by s;,; the sample covariance between the 
k-th and l[-th variates. Assuming that the distribution of s,; is independent of 
the joint distribution of the p sample means (Z; , %, --- , Z,) one obtains the 
equation 


(9) ve ie — 


Here oim is the population covariance of the variates z; and zm, 


¥ = V(t, --: tp) = / eee [ nett fee, +++, Zp) dt +++ dxp, 


denotes the characteristic function of the parent distribution and 


2 
= Yin = ia 


Vi = 3p? ” Bly We” 


If (9) holds for 1, m = 1, 2, --- , p one has a system of partial differential equa- 
tions which leads to the characteristic function of the multivariate normal 
distribution. : 



















CARLOS E. DIEULEFAIT 


NOTE ON A METHOD OF SAMPLING 
By Carztos E. DIrvLeEFaitT 


National University of Litoral, Argentina 


Olds’ has considered the following problem: Given a lot of sizem =s +r 
containing s items of a specified kind. Items are drawn without replacement until 
j of the s items have been drawn. The problem is to determine the probability law 
of n, the number of drawings which have to be made. In the present note, we shall 
consider a certain limiting form for the probability function of » and make 
some remarks concerning repeated sampling of this type. 

If n is the size of a drawing j < n < r + jits probability law P(n) is given by: 


m 1 
P(n) = Crot@es J +1) eh 1 — 2)" dz, 
(n) .. « ae -Ts 1H & if 2” (1 — x)” "dz 
The characteristic function of n is 


g(t, n) = > P(n)e™ = 


Nm) 


T(s + 1) 
I(pr(s —j + 1) 


Differentiating we find 


1 
* | a1 — x) (1 — « + ze‘) dz. 


( ) g(t n) [ (1 = x) (1 mn i + ze’) de 
1 om) eh MA OT et eas 


1 
[ a1 — x) (1 — x + ze)’ dx 
0 








and hence 
r+j : 1 
m(n) = 2 Pln)n = let, mls = FEES 
For the calculation of moments about the mean we take 


(2) 


from which we obtain 





g(t, a m1) _ e ™ol(t, n), 






ort+j 
[e(t, n — m)lmo = 2D P(n)(n — m)* = p(n). 


In particular, uw. = ee 


already been given by Olds using another method. Putting 


The values of m,(n) and yo(n) have 


r) 
s+1 


= B, we have 





1E. G. Otps, Annals of Math. Stat., Vol. 11 (1940), p. 355. 


SAMPLING 


r(j, 2) 


cittheamiians 3% 


; - - ; r(5, 3) 
us = 3(1 — B)un + B(2 — 38 + 6) + CT 


(4)/ - 
ws = (6 — 48)us — (11 + 46 + 66*)ue — 0(6 — 116 + 66° + 6%) + 7A) 
(s + 1, 4) 


where 1 = r(r — 1) +++ (7 —~ Kk +1),G,4) =5G +1 ---G+tk-D? 
We can obtain a limiting form for P(n) in the following way: 


Since 
ef4 ti ao ef-.2 
> r r’ 
we find 


oa ? 1 ° : 
y («, nad) = npr ED > es 1D b 21 — 2) 41 — 2 + xe’) dz. 


Therefore 


(3) lim ¢ («, 2-3) = [ L(x)e* dz, 
r 0 


ro 


_ I(s + 1) jl ae s—j 
Ma) = sGre-3+)* °~* 


The interpretation of (3) is that the distribution rai, P(n)| has as its 


limiting form the distribution {z, L(z)} asr — @. 
Letting m, %, -*: nm» be a sample of size w and 7 the mean, i = s yn. 


W i=l 
For the characteristic function of 7 we have 


g(t, f) = } II P(n)e"**” = I ? (, n) E (, )y 


nymj i= 


and hence 


i(t 
et) _.° (5 ") ss K 2 
tet/w 


g(t, n) 


g(t, 7) e(£,n) 
w 


2 For an easy symbolical method of calculation cf. C. Dieulefait, Comptes Rendu, Vol. 
208, p. 145. 





96 CARLOS E. DIEULEFAIT 


For t = 0 we have m(”) = m(n). But: 

fo). APE ot) 

dt v(t, 7) wLdt* g(t, n) Jemtn 
Then for t = 0 we arrive at 


Bati(h) = a+s(7) ‘ 


For a = 1, we have 
wala) = Mer) 
w 
and this leads us to 
ox = 4/ Em EDGED 


w(s + 1)*(s + 2) 
By the Tchebycheff theorem we obtain 


P(| A — m(n)| < los) > 1 - 5. 
We can take / and w as large as we please; then we have the following stochastic 
limit 


lim % = m,(n). 


w->o 


®,(t) = ¢ (. = >) = @ miles [> ( 
on o% 


Now, we have 


#0) _ m4 [etn] . 
On t=t/eqw 


Gt) aa y(t, n) 


Remembering (1) we readily obtain 
, + |- ry (@=-DMG+N), J Jt 
,(t) ofwlL (s+1)? (s+ 1(s+2) -s+1 


&,,(t) + ite 


t+ --- 
i+. J +... 


o,UWS oS 1 


Thus, we find 


lim 22) = 


= t. 





CORRELATION DUE TO COMMON ELEMENTS 97 


This result implies that the distribution (ea), Pca} has the limiting 


n 


1 
normal distribution {x Vin ont, as w—> ©, 


A SEQUENCE OF DISCRETE VARIABLES EXHIBITING CORRELATION 
DUE TO COMMON ELEMENTS 


By Cart H. Fiscuer 
University of Michigan 


1. Introduction. Studies of correlation due to common elements have been 
made more or less sporadically over the past thirty years in attempts to throw 
more light on the meaning of correlation. Numerous examples may be cited. 
One of the earliest was a study by Kapteyn [1] in which he showed that two 
sums, each of n elements drawn from a normal population with k elements in 
common, had a correlation coefficient of k/n. This was considerably generalized 
by the writer [3] who considered sums of different numbers of elements drawn 
from quite arbitrary continuous distributions. The work was extended to in- 
clude sequences of three or more such sums. Antedating this latter paper, 
Rietz [2] has devised various urn schemata in one of which pairs of drawings of s 
balls each were produced with ¢ balls held in common. The coefficient of 
correlation between the numbers of white balls in each of the pairs of drawings 
was found to be ¢t/s. 

Fairly recently some interest has been shown in this subject in connection 
with the study of heredity; hence it appeared that it might be of value to present 
the following study by elementary methods of a sequence of discrete variables 


in which each member is linked to the adjacent members by various specified 
numbers of common elements. 


2. Two variables. A pair of discrete variables is defined as follows: The 
first, x, is equal to the number of white balls in a set of s; balls drawn one at a 
time from an urn which is so maintained that the probability of drawing a white 
ball is always a constant, p. The second, y, is equal to the number of white 
balls in a second set of s2 balls formed by drawing t): balls at random from the s 
balls of the first set plus s. — ty balls drawn directly from the urn. The numbers 
8, and s. may or may not be equal. 

Evidently the marginal distribution of x follows the Bernoulli law and is given 


by o q' *p*.’ The first step in finding P(x, y:h2), the bivariate distribution 


1 By (*) is meant the number of combinations of a items taken batatime. It shall be 


understood that (7) = 0if b < 0orb> a. 











98 CARL H. FISCHER 





function of z and y with t,. balls in common between the two drawings, is to 
write the product of the three probabilities: of obtaining x white balls in the 
first set; of drawing d of these whites in the t)2 balls chosen at random from this 
set; of drawing exactly y — d white balls among the s: — t2 balls drawn directly 
from the urn to complete the second set. This product may readily be reduced 
to the form shown below in (1), symmetric in z and y and in s,; and s , which 
is then summed on d from 0 to tz. Thus 


81 — tie\f tie \f 82 — bie\ 0:+03—t1::—2—-y+d _ zty—d 

(1) P(x, y:tn) = 7 oe _. 
The marginal distribution of z has already been given. From the symmetry 
of (1) it is obvious that the corresponding marginal distribution of y must be 


characterized by the Bernoulli distribution function (*) gq’? “p". The variances 


of the marginal distributions are s;pq and sepq, respectively. 

We next proceed to demonstrate that both of the regression curves are linear 
and to find the equations of the lines. Consider an array of x on y for some 
fixed value of y. The mean of the array is 


—1 81 
(2) ‘a (*) op D aPC, yite). 


The summation in the right member of (2) may be expanded and then re- 
written as 


(3) * tie i tie q’ y p” 7 « e* =—7 8,1—t}g—z+d _2z—d 
amo \d /\ y — dd z=0 : . 


The inner summation in (3) is seen to equal d + p(s: — tz) and hence (2) 
becomes 


= (°) {E (GT ta + wea — wo} 
= (2) {3 (8 > 1)(%— #) + em - 0 (2)} 


Then the equation of the line of regression of x on y becomes 
(4) Ey = twy/se + p(sic— tie). 

By symmetry, the line of regression of y on x may be seen to be 
Gz = tyr/s, + p(s. — ty). 


The square of the correlation coefficient is equal to the product of the slopes 
of the two regression lines, hence 


(5) Try = tre/(8182)*. 


If 8; = 8. = 8 we have the familiar result t/s. 








CORRELATION DUE TO COMMON ELEMENTS 99 


3. Three variables. A third variable, z, may now be defined as the number 
of white balls in a set of s; balls formed by drawing &; balls at random from the 
8 of the second set plus s3; — t3 drawn directly from the urn. It is evident from 
the results on two variables that the marginal distribution of z follows the 


Bernoulli law and that the equations of the regression lines of z on y and y 
on z are 


= byy/s. + p(ss — bs); 
tesz/s3 + p(S2 — tos). 


The correlation coefficient, r,, , is equal to te;/(s2ss)’. 

The relationship between x and z remains to be investigated. The proba- 
bility of the joint occurrence of x whites on the first drawing and z whites on 
the third when it is specified that the s; and s; balls of the two sets shall include 
the same g balls in common is given by the right member of (1) with g, z, and s; 
replacing ti, y, and s., respectively. When this expression is multiplied by 
the probability that the first and third sets do contain exactly g balls in common 
and the product is summed on g over the range 0 to ti , we have P(x, z:ti , ts), 
the bivariate distribution function of x and z. Thus 


(6) P(x, zit, tes) = ; ees P(a, 2:9). 


The mean of the array of x and z for any fixed z may be written, after inverting 
the order of summation: 


t12 — 81 —1 
83 ae . mY ~ —) \ 
2X [@) ” 2X om ea |(' toes — g/\tos} J° 
The expression within the square brackets of (7) is identical in form with the 
right member of (2), and hence we now have 


- tie tog 8182 — bites 
— aun Zz qu 
(8) 2 + - p 
By symmetry, 


os tra tas 4 22 - tie tes ' 
81 82 82 
The coefficient of correlation between z and z is found to be 


_ xa tes 
(9) Tz = Se (8; 83)! . 


















100 CARL H. FISCHER 






It will be observed that 
(10) Tez = Toylyz « 


Interesting relationships also exist among the partial and multiple correlation 
coefficients and the multiple regression surfaces. It will be convenient here to 
measure each variate from its mean and to replace the subscripts z, y, and z, 
on r by 1, 2, and 3, respectively. Then the multiple regression surface of each 
variable on the other two may be conveniently expressed in terms of the cofactors 
of the correlation determinant. From the results found by the writer [4] for 
the case where each element r;; of the correlation determinant may be expressed 


as the product 7;,141°7i41,142 °** Tj-1,;, we now have 
2 2 
Ry = | = T23 5 Ry = —ry»(1 si 123); 
2 2 
Rx» = 1 — r3, Ros = —1e3(1 — Tis), 


Rs = 1 — rie, Ry; = 0. 


Then the regression planes of x on y and z and of z on z and y are given, 
respectively, by 





_ T1201 +. tie 
02 8°” 








_ e303 __ ag 
02 S2 , 


The regression plane of y on z and z is 


_ G2 {rai — r93) o* Yog(1 — riz) : 


= 2 3 
li — 1rj27o3 o1 03 


- (s3 83 — Sotos)ty . (81.83 “ Solia)les 


81 83 83 — lites 8185 83 — lites 
The three multiple correlation coefficients are 
1- (i —- Ki — 4&)7 
(11) 71-23 = Tr, 73.12 = 123, r2.3 = [i= 4 — it — 
1 — rj2793 


The partial correlation coefficients are 


2 4 2 4 
on 1 - 
(12) 712.3 = Ti | . 723.1 = 123 j=, |. 733.2 = 0. 


oe 2 2 
1 — rj27 23 1 — rj27o3 


4. k variables. A sequence of k variables may be formed successively as were 
the three considered above. It will be convenient here to designate the variables 
by z; (¢ = 1, 2,---k). We also define h; as the total number of balls held in 
common between the first and the 7-th drawings. Then, as special cases, 
hy ce 8, and he = tie . 








CORRELATION DUE TO COMMON ELEMENTS 101 


The bivariate distribution functions, regression lines, and correlation coeffi- 
cients associated with any two consecutive variables in the sequence and with 


any two variables separated by only one other variable can, from the preceding 
results, be written at once. 


It is not difficult to derive the bivariate distribution function for x, and x 
by an extension of the method used in deriving (6). We. then have 


P(a1, Cette, tis +++ teak) 


BE ERC ey rma}. 


The equation of the line of regression of x; on 2, is 
81 
= » x P(x » Tritico, tog ++ te—1,%). 
z\= 
This may be reduced, by repeated applications of the steps illustrated in the 
corresponding case for three variable, to the form 
tof coe fe 8) So °** Spy — bol on 0 Gi ae 
(14) 21 -_ 12413 k—1,k Le + 1°2 k—1 12 23 k—1,k 
S283 +++ S 8283 +++ Sri 
By symmetry, we have 
hia sew Me +2 + @ —~ bale «++ & ce, 
a an tie tog bode pg S2 83 k 12 fo3 tek 


§) 82 +++ Sx-1 $283 +++ Sx-1 


Then the simple correlation coefficient between 2, and 2; is 


tiotes +++ tek 
(15) Tr = ee ae = 130°To3 °° * Tk-1k. 
8283 +++ Sx—1(81 Sx) 


It was shown by the writer [4] that for a sequence such as we are considering 
the multiple correlation coefficient is a function only of the variables immedi- 
ately adjacent to the one considered, and that the partial correlation coefficient 
is zero for any pairs except those of consecutive variables in the sequence. Thus, 
the formulas given in terms of simple correlation coefficients for the case of a 


sequence of three variables may be interpreted so as to cover the case for k 
variables. 


REFERENCES 


[1] J.C. Kapreyn, “‘Definition of the correlation-coefficient,’’ Monthly Notices Roy. Astron. 
Soc., Vol. 72(1912), pp. 518-525. 

[2] H. L. Rretz, ‘‘Urn schemata as a basis for the development of correlation theory,”’ 
Annals of Math. Vol. 21(1920), pp. 306-322. 

[3] C. H. Fiscner, “On correlation surfaces of sums with a certain number of random 
elements in common,”’ Annals of Math. Stat. Vol. 4(1933), pp. 103-126. 

[4] C. H. Fiscner, ‘“‘On multiple and partial correlation coefficients of a certain sequence 
of sums,”’ Annals of Math. Stat. Vol. 4(1933), pp. 278-284. 














REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 


The Seventh Annual Meeting of the Institute of Mathematical Statistics was 
held from Saturday to Tuesday, December 27-30, 1941, in conjunction with 
the meetings of the Allied Social Science Associations. With the exception of 
the session on Tuesday afternoon, all sessions were held at the Biltmore Hotel. 


The following one hundred seventy-seven members* of the Institute attended 
the meeting: 


F. L. Alt, H. E. Arnold, K. J. Arnold, L. A. Aroian, K. J. Arrow, R. W. Bachelor, I. L. 
Battin, B. M. Bennett, Carl Bennett, Joseph Berkson, Felix Bernstein, E. E. Blanche, C. 
I. Bliss, A. J. Bonis, Paul Boschan, A. H. Bowker, D.S. Brady, A. E. Brandt, R. H. Brown, 
R. W. Burgess, J. H. Bushey, Belle Calderon, B. H. Camp, J. M. Clarkson, W. G. Cochran, 
A. C. Cohen, Jr., M.S. Cohen, Isadore Cohn, J. B. Coleman, L. M. Court, D. R. G. Cowan, 
Gertrude Cox, C. C. Craig, B. B. Day, D. B. DeLury, W. E. Deming, W. J. Dixon, H. F. 
Dodge, H. F. Dorn, Paul Dorweiler, David Durand, J. H. Dutka, P. S. Dwyer, Churchill 
Eisenhart, W. F. Elkin, J. S. Elston, M. L. Elveback, D. R. Embody, W. D. Evans, Willy 
Feller, J. W. Fertig, Irving Fisher, W. C. Flaherty, M. M. Flood, R. M. Foster, L. R. 
Frankel, H. A. Freeman, G. R. Gause, Hilda Geiringer, C. H. Graves, J. A. Greenwood, 
J. I. Griffin, C.C. Grove, F. E. Grubbs, E. J. Gumbel, M. J. Hagood, H. J. Hand, M. H. 
Hansen, Myron Heidingsfield, Edward Helly,G.M. Hopper, Harold Hotelling, E. A. Hoy, 
William Hurwitz, Seymour Jablon, W. W. Jacobs, Rachel Jenss, Myron Kantorovitz, Karl 
Karsten, Leo Katz, C. J. Kiernan, B. F. Kimball, A. J. King, L. F. Knudsen, H. S. Konijn, 
Tjalling Koopmans, R. L. Kozelka, A. K. Kurtz, A. R. Kury, 8. M. Kwerel, Jack Laderman, 
Oscar Lange, D. H. Leavens, B. A. Lengyel, Howard Levene, Ida Levin, M. J. Liss, Irving 
Lorge, A. J. Lotka, Eugene Lukacs, G. A. Lundberg, P. J. McCarthy, W. G. Madow, Benjamin 
Malzberg, Henry Mann, Jakob Marschak, J. W. Mauchly, G. F. T. Mayer, Margaret Merrell, 
J. N. Michie, J. R. Miner, Nathan Morrison, J. E. Morton, F.C. Mosteller, M. R. Neifeld, 
Harold Nisselson, G. E. Niver, M. L. Norden, Nilan Norris, J. I. Northam, C. O. Oakley, 
E. G. Olds, P. S. Olmstead, J. G. Osborne, R. F. Passano, Edward Paulson, C. K. Payne, 
Victor Perlo, J. M. Perotti, L. M. Petit,G. A. D. Preinreich, Harry Press, Elsie Ratkowitz, 
L. J. Reed, F. V. Reno, J. S. Ripandelli, Selby Robinson, H. G. Romig, A. C. Rosander, 
Ernest Rubin, H. A. Ruger, P. A. Samuelson, M. M. Sandomire, Max Sasuly, F. E. Satter- 
thwaite, Henry Scheffe, H. L. Schug, H. A: Secrist, Nathan Seiden, W. A. Shelton, R. W. 
Shephard, W. A. ShewhartsH. M. Shulman, Harry Siller, R. R. Singleton, L. E. Smart, J. 
H.Smith,G.W. Snedecor, Emma Spaney, Mortimer Spiegelman, Arthur Stein, M.S. Stevens, 
J. 8. Stock, M. M. Torrey, M. N. Torrey, W. R. Van Voorhis, D. F. Votaw, Jr., W. C. 
Waite, H. M. Walker, W. A. Wallis, A. N. Watson, E. W. Wilson, C. P. Winsor, Jacob 
Wolfowitz, M. A. Woodbury, W. J. Youden, Joseph Zubin. 


The opening session on Saturday afternoon on The Role of Tests of Significance 
in Biological Research was held jointly with the Biometrics Section of the 
American Statistical Association. Professor E. B. Wilson of the Harvard 
School of Public Health acted as chairman. The session was in the form of a 
round table discussion, the principal discussants being: W. Edwards Deming, 
Bureau of the Census; Harold Hotelling, Columbia University; Lowell J. Reed, 
Johns Hopkins University; and George W. Snedecor, Iowa State College. 


* The list of attendance has been compiled from the registration list supplied by the 
Director of the New York Convention and Visitors Bureau. 


102 








REPORT OF NEW YORK MEETING 103 


On Saturday evening, under the chairmanship of Dr. Walter A. Shewhart of 
Bell Telephone Laboratories, a session was held jointly with the Econometric 


Society on Theory of Runs and Confidence Intervals. The following program was 
presented : 


1. The theory of runs in random data. 
Harold T. Davis, Northwestern University. 

2. Five time series significance tests based on signs of differences. 
Geoffrey H. Moore, Rutgers University. 
W. Allen Wallis, Stanford University. 

3. Conference intervals for the unknown median of any type of universe. 
John H. Smith, University of Chicago. 


The morning and afternoon sessions on Sunday on Numerical Computational 
Devices were held jointly with the American Statistical Association, with the co- 
operation of the Committee on Addresses in Applied Mathematics of the 
American Mathematical Society. Dr. C. R. Langmuir of the Carnegie Founda- 
tion for the Advancement of Teaching acted as chairman of the morning session 
on Statistical and -Matriz Calculation. The following papers were presented: 


. Some matriz methods in least square and other multivariate problems. 
Harold Hotelling, Columbia University. 
. The Mallock electrical calculating machine for solving simultaneous linear equations. 
Elizabeth Monroe Boggs, Cornell University. 
. Mathematical operations with punched cards. 
J. C. McPherson, International Business Machines Corporation. 
. Recent developments in correlation technique. 
Paul S. Dwyer, University of Michigan. 


The subject of the afternoon session was Mechanical Solution of Differential 
Equations. Dr. R. M. Foster of the Bell Telephone Laboratories presided for 
the following program: 


1. Punch card calculation of orbits. 
W. J. Eckert, Naval Observatory. 
2. Punch card methods for solving linear differential equations of second order. 
Martin Schwarzschild, Columbia University. 
3. Differential analyzers. 
Harold L. Hazen, Massachusetts Institute of Technology. 
Discussants: 
L. S. Dederick, Aberdeen Proving Ground. 
Norbert Wiener, Massachusetts Institute of Technology. 


Professor Helen Walker of Columbia University held the chair at the Sunday 
evening session, a joint session with the American Statistical Association. The 


following program was given under the title: On Some Technical Aspects of 
Sampling. 


1. On the relative efficiencies of various areal sampling units in population inquiries. 
M. H. Hansen, Bureau of the Census. 
William Hurwitz, Bureau of the Census. 











REPORT OF NEW YORK MEETING 


. On the monthly sample survey of unemployment. 
L. R. Frankel, Work Projects Administration. 
J. 8. Stock, Work Projects Administration. 

3. On certain biases in surveys by questionnaire. 

J. Cornfield, Bureau of Labor Statistics. 
4. On the relation of probability to sampling. 
W. G. Madow, Bureau of the Census. 
5. Recent developments in sampling for agricultural statistics* 
G. W. Snedecor, Iowa State College. 
A. J. King, Iowa State College. 

Discussants: 

W. G. Cochran, Iowa State College. 

J. A. Greenwood, Duke University. 















































Another joint session with the American Statistical Association was held on 
Monday morning. The topic considered was: What Can the Census Do With 
Sampling? Professor L. Edwin Smart of Ohio State University presided for the 
following program: 












1. An appraisal of the 1940 sampling scheme. 
T. O. Yntema and Dickson H. Leavens, Cowles Commission for Research in 
Economics. 
2. Some requirements of sampling design and presentation. 
W. Edwards Deming, Bureau of the Census. 
3. Compromises, losses, and gains brought about by the introduction of sampling. 
L. E. Truesdell, Bureau of the Census. 
4. The proposed annual sample census. 
Philip M. Hauser, Bureau of the Census. 
Discussants: 
A. N. Watson, Curtis Publishing Company. 
F. F. Stephan, Office of Production Management. 
S. A. Stouffer, University of Chicago. 















































On Monday afternoon, a session was held for the reading of contributed 
papers on Probability and Statistics. Professor Harold Hotelling acted as 
chairman, and the following papers were read: 

















1. Scanning data to determine significance of difference between frequency of an event in 
contrasted groups. 
Joseph Zubin, New York State Psychiatric Institute. 
2. Compounding probabilities from independent significance tests. 
W. Allen Wallis, Stanford University. 
3. A class of multivariate distributions. 
Walter Jacobs, Securities and Exchange Commission. 
4. Definition of the probable error. 
E. J. Gumbel, New School for Social Research. 
5. A generalized analysis of variance. 
F. E. Satterthwaite, University of Iowa. 
6. On the power function of the analysis of variance test. 
Abramham Wald, Columbia University. 
7. Method of computing the roots of cubic and quartic equations by hyperbolic and circular 
functions. 
E. E. Blanche, Michigan State College. 


















































REPORT OF NEW YORK MEETING 


8. Additive partition functions. 
J. Wolfowitz, Columbia University. 


9. Limited type of probability distribution applied to flood flows. (Preliminary report) 
B. F. Kimball, New York State Public Service Commission. 
Abstracts of these papers follow this report. 


Professor Harold Hotelling acted as chairman for the session on Tuesday 
morning, held jointly with the Econometric Society and the American Statistical 
Association. The program consisted of invited addresses on Recent Advances in 
Mathematical Statistics by Professors Burton H. Camp of Wesleyan University 
and Cecil C. Craig of the University of Michigan. 

The session on Tuesday afternoon was held at The Boyce Thompson Institute, 
Yonkers, New York. It was a joint session with the Biometrics Section of the 
American Statistical Association on The Design of Experiments. Dr. W. J. 
Youden of The Boyce Thompson Institute acted as chairman and had various 
experimental designs on display in the greenhouse. Through the courtesy of 
members of the Institute staff, transportation between the railroad station and 


the Institute was provided. After the program, tea was served. The following 
papers were read: 


1. Biological interpretation of interactions. 
W. C. Jacobs, Cornell University. 

2. Adapting the design to the experiment. 
Gertrude M. Cox, North Carolina State College. 

3. Sampling theory when the sampling units are of unequal size. 
W. G. Cochran, Iowa State College. 


4. Sampling errors of systematic and random surveys of cover type areas. 
J. G. Osborne, U.S. Forest Service. 


A luncheon meeting Monday noon was held jointly with the Econometric 
Society and was attended by ninety-four persons. Professor W. C. Mitchell 
of Columbia University presided and called on Irving Fisher, Harold Hotelling, 
W. G. Cochran, and W. A. Wallis for brief remarks. 

The annual business meeting of the Institute was held late Monday after- 
noon, with President Hotelling presiding. 

The report of the Secretary-Treasurer was read. The report appears on 
pp. 107-109. 

President Hotelling stated that Mr. George W. Petrie, III, had audited the 
books and records of the Treasurer and found them to be in agreement with 
the Report presented. 

Dr. Madow, who acted as teller, reported that the mail balloting had resulted 
in the election of the following officers for 1942: 


President: Professor C. C. Craig 

Vice-Presidents: Professor A. T. Craig 
Mr. E. C. Molina 

Secretary-Treasurer: Professor E. G. Olds 








106 REPORT OF DALLAS MEETING 





After discussing various ways of broadening the service of the Institute, a 
motion was carried which recommended that the Board of Directors appoint 
committees to study the following matters: junior memberships, local chapters, 
and advertising for the official journal. Later the Board approved this recom- 
mendation and committees were appointed. 

Epwin G. OLpDs, 
Secretary 


REPORT OF THE DALLAS MEETING OF THE INSTITUTE 



































The twelfth meeting of the Institute was held jointly with the meetings of 
Section A of the American Association for the Advancement of Science and of 
the Econometric Society in Dallas on December 29-30, 1941. Professor 
Dunham Jackson, Secretary of Section A of the A. A. A. S., has kindly sent 
the following information regarding the meeting: 


Sessions of the joint meeting of the Institute of Mathematical Statistics 
with the Econometric Society and Section A of the A. A. A.S. were held Monday 
afternoon, December 29, and Tuesday morning and afternoon, December 30, 
at Southern Methodist University. The number of contributed papers offered 
on Tuesday was such as to cause extension of the session into the afternoon. 

On Monday afternoon addresses were delivered, in accordance with the 
programs issued in advance, by Professor A. B. Coble of the University of 
Illinois, retiring Vice President for the Section, on A Certain Set of Ten Points 
tn Space, and Professor 8S. S. Wilks of Princeton University on Representa- 
tive Sampling. 

The order of papers on Tuesday was as follows: 


1. On the theory of the tetrahedron. 
N. A. Court, University of Oklahoma. 
2. A method for integrating the linear hyperbolic equation in three independent variables. 
E. W. Titt, University of Texas. 
3. On powers of a matriz whose elements are sets of points. 
S. T. Sanders, Jr., Southwestern Louisiana Institute. 
4. Analytic theory of parametric linear partial differential equations. 
W. J. Trjitzinsky, University of Illinois. 
5. The theory of the Riesz integral. 
H. J. Ettlinger, University of Texas. 
6. Obtaining differences from tables which are in the form of punched cards. 
Harry Pelle Hartkemeier, University of Missouri. 
7. On investment and the valuation of capital. 
Montgomery D. Anderson, University of Florida. 
8. Advantages of singling out degrees of freedom in analyses of variances. 
W. D. Baten, Michigan State College. 
9. The incidence of an income tax on saving. 
Abram Bergson, University of Texas. 
10. Certain tests for randomness applied to data grouped in small sets. 
Edward L. Dodd, University of Texas. 















REPORT OF SECRETARY-TREASURER 


11. Stratified sampling (Preliminary Report). 
A. M. Mood, University of Texas. 

12. On convergence factors in convergent integrals. 
Charles N. Moore, University of Cincinnati. 


13. Geometric statement of a fundamental theorem for four-dimensional orthographic 
azonometry. 


W. H. Roever, Washington University. 
14. A certain non-metric Moore space. 
F. B. Jones, University of Texas. 
Abstracts of papers 8, 10, and 11 follow this report. 


Papers 1 to 8 inclusive on this list were presented Tuesday morning, and 
papers 9 to 14 at the afternoon session. In the absence of the authors, papers 
10 and 12 were read by title. ; 

The presiding officer Monday afternoon was Professor G. T. Whyburn of 
the University of Virginia, Chairman of the Section and Vice President of the 
A. A. A.S. On Tuesday Professor H. J. Ettlinger of the University of Texas 
presided for papers 1 to 4 inclusive, and Professor S. S. Wilks of Princeton 
University for the rest of the program. 

Epwin G. OLpDs, 
Secretary 


ANNUAL REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE 


On September 2-4, the Institute met at the University of Chicago, in conjunc- 
tion with meetings of the American Mathematical Society, Mathematical 
Association of America, and Econometric Society. Sixty-eight members of the 
Institute attended the meeting. 

As mentioned in the 1940 report of the Secretary, the Institute became 
affiliated with the American Association for the Advancement of Science at the 
close of 1940. President Hotelling appointed Professor Truman L. Kelley as the 
representative of the Institute on the Executive Council of the A.A.A.S. for 1941. 

On December 29-30, 1941, the Institute held two joint sessions with Section A 
of the A.A.A.S. and the Econometric Society in connection with the Annual 
Meeting of the A.A.A.S. at Dallas, Texas. Professor Wilks gave an address 
at one of the sessions. The report of the Seventh Annual Meeting of the Insti- 
tute appears on pp. 102-106. 

The Institute was invited to send an official representative to the Academic 
Festival of the University of Chicago, September 27-29, 1941. Mr. John F. 
Kenney was appointed as the representative of the Institute. 

During the past year, the Secretary has received a number of inquiries from 
members regarding opportunities for doing statistical work in business, govern- 
ment, and industry. While the Institute has no particular organization for 
such service, the Secretary will be glad to supply information regarding positions 
which come to his attention. 








108 REPORT OF SECRETARY-TREASURER 


The Institute has printed an official abstract blank to be used in submitting 
abstracts for contributed papers. A supply of these blanks can be obtained by 
writing to the Secretary. 

The deaths of two of the members of the Institute have been reported since 
the last Annual Meeting: Professor James W. Glover, University of Michigan, 
and Mr. M. C. MacLean, Dominion Bureau of Statistics, Ottawa. 

The following financial statement covers the period from January 1, 1941 to 
December 10, 1941: 


RECEIPTS 

























ESN INGE EN oo oo aor boo: sh os: wie ROSS DSSS RANE ONES S SRM 
ROCKEFELLER FOUNDATION GRANT 





ANNALS OFFICE 
IE PICEINCN TI no oisiicicc diese chacnns &sGuesnsibessaeesans $78.95 
NS i cepa daha bl ke Sra ied Rrra eB ha Ba 82.02 


Printing 





WAVERLY PREssS 
Printing and Mailing Annals—4 issues...................0. 00 cc cece eeeeees 2,913.09 
Back NUMBERS OFFICE 






GD perc ac wT rae rsa Meas edness rake la ds Ua Decca rotons 
Purchase of back numbers from H. C. Carver.................... 216.74 
Reprinting 200 copies of Vol. V, No. 3 





be he ee 
SECRETARY-TREASURER’S OFFICE 


OM en ene ee a ee eRe ee ener ee ee $52.06 
SEER OCTETS EE POO CETTE EE EE ETD 150.04 
ee 112.78 


Clerical Help 





PRINTING PROGRAMS FOR MEETINGS 
MISCELLANEOUS 







i a aia hi eek el Ca Rae w ew heeaRnda 
BALANCE ON HAND, December 10, 1941 


$5,646.25 





In comparison with the financial condition of the Institute at the end of 1940, 
the receipts from dues, subscriptions, and sales of back numbers have increased 















REPORT OF SECRETARY-TREASURER 109 


nearly two thousand dollars. This is largely due to a net increase of 171 mem- 
bers and 20 subscriptions. Early in the year the Institute received the last 
thousand dollars of its grant from the Rockefeller Foundation. This source of 
income has materially assisted the Institute in surviving a period of financial 
uncertainty. Its loss will be severely felt. 

The expenditures of the Institute show a slight decrease, partly due to the 
fact that fewer back issues of the Annals had to be reprinted. An unnecessarily 
large item of expense is that of the postage which has to be paid because of the 
slowness of some members and subscribers in paying dues and reporting changes 
of address. Many copies of the Annals have to be reclaimed and mailed a 
second time. Members could save the Institute considerable expense if they 
would pay their dues promptly and report change of address well in advance of 
publication dates of the Annals. 

Financial prospects for 1942 are mixed. The importance of the statistical 
approach to problems of national defense has caused increased interest in mathe- 
matical statistics with the result that many people employed in government 
service or industry are applying for membership and urging their libraries to 
subscribe to the Annals. On the other hand, delivery to, and collection from, 
foreign libraries is becoming increasingly difficult, and a marked decrease in 
the number of foreign subscriptions can be anticipated. Furthermore, operating 
expenses of the Institute are almost certain to increase as material and labor 
costs advance. On the whole, it seems very probable that it will require the 
full co-operation of all the members to avoid operation at a loss during the next 
calendar year. 

Epwin G. OLDs, 


Secretary-Treasurer. 
December 29, 1941 
























ABSTRACTS OF PAPERS 
I. Presented on December 27, 1941, at the New York Meeting of the Institute 


A Generalized Analysis of Variance. FRANKLIN E. SATTERTHWAITE, Uni- 
versity of Iowa and Aetna Life Insurance Company. 








This paper examines the fundamental principals underlying designs for the analysis 
of variance. Given several statistics of the type, x; = 2:01; where the 6’s are arbitrary 
orthogonalized linear fnctions of certain underlying normal data, z,; a rule is set up for 
determining a set of m, as linear functions of the z, such that xo = (xz. — mx)? will be 
independent of the remaining x;’s. Further it is shown that simultaneously with the 
above, the z’s and the @’s may be subjected to certain types of linear restrictions (for the 
purpose of estimating parameters or otherwise) without disturbing the distributions or 
the independence relations except for the appropriate reduction in degrees of freedom. 
The rule used to determine the m’s gives results consistent with the standard designs for 
the analysis of variance. However, it goes further in that one may use weighted rather 
than simple averages in setting up his design. A practical applicaticn of this is the two 
way analysis of data which are averages and lack homogeniety of variance through con- 
stants of proportionality between the variances are known. The two way analysis of 
incomplete data is another practical problem which is solved by the simple expedient 
of a zero weight. The use of weighted averages frequently introduces difficulties in esti- 
mating parameters, particularly the mean. The combination of the linear restriction 
concept with standard analysis of variance methods solves this difficulty. 















On the Power Function of the Analysis of Variance Test. 
Columbia University. 


ABRAHAM WALD, 





It is known that the power function of the analysis of variance test depends only ona 
single parameter, say \, where A is a certain function of the parameters involved in the 
distribution of the sample observations. Let Z be any critical region (subset of the sample 
space) whose size doesnot depend on unknown parameters, i.e., it has the same size for 
all values of the parameters which are compatible with the hypothesis to be tested. It is 
shown that for any positive c the average power (a certain weighted integral of the power 
function) of the region Z over the surface \ = c cannot exceed the power of the analysis 
of variance test on the surface \ = c (the power of the latter test is constant on the surface 
»=c). P. 8S. Hsu’s result, Biometrika, January, 1941, pp. 62-68, follows from this as a 
corollary. 


Definition of the Probable Error. E. J. GumBe.t, The New School for Social 
Research. 


The probable error is usually defined either as the semi-interquartile range or as } of 
the standard error. We define it as half of the smallest interval that has the probability 4. 
For distributions which never increase (decrease), the beginning (end) of this interval is 
the origin (the median), and the end is the median (the end of the distribution). In general 
the probable error p is the solution of the equations W(t + p) — W(t — p) = 4.and w(t + p) 
= w(t — p) where & denotes the midpoint of the interval. For symmetrical distributions 
the first definition remains valid. For the Gaussian distribution the second definition 
holds besides. The numerical values for the midpoint ¢ and the probable error p are given 
for some distributions usual in statistics. The calculation of the standard error of the 
probable error, which depends upon the distribution w(z), determines whether the probable 
error is more or less precise than the standard error. For the asymmetrical exponential 


110 














ABSTRACTS OF PAPERS 111 


distribution the mean and the median have the same precision, and the probable error is 
more precise than the standard error. For the first law of Laplace, and for Galton’s re- 
duced distribution the median and the probable error are more precise than the mean 
and the standard error. For Maxwell’s distribution the mean and the probable error are 
more precise than the median and the standard error. 


A Class of Multivariate Distributions. Watter Jacoss, Security and Ex- 
change Commission, Washington. 


The multivariate normal distribution has the property that its probability density is 
constant along the surface of a hyper-ellipsoid. The class of distributions characterized 
by this property is considered. The form of the characteristic function of any distri- 
bution of the class is determined; in this way the parameters of the distribution are shown 
to be simply related to the first and second moments, when these exist. 

Every distribution of the class is the n-variate extension of a univariate symmetrical 
distribution. The method of determining the form of the extension of such a univariate 
distribution is given. A number of properties of regression for the multivariate normal 
distribution are shown to hold for any distribution of the class. Among other properties 
considered is the form of some sampling distributions. Some special cases of interest, 


including the extensions of the Cauchy distribution and the median law, are discussed 
briefly. 


Methods for Scanning Data to Determine the Significance of the Difference 
Between the Frequency of an Event in Contrasted Groups. JosEPH ZUBIN, 
N. Y. S. Psychiatric Institute, New York. 


In many investigations in Psychology, Sociology, Economics and Public Health, there 
is a need for a quick and ready method for scanning a mass of data in order to select the 
items that have a significant bearing on the problem under investigation. The statistical 
procedure for this item analysis consists essentially of evaluating the 2 X 2 tables which 
arise when two groups are contrasted for the presence and absence of a given character 
or event. The chi square method or its equivalent, the ratio of the difference between 
per cents to its standard error, require considerable labor and time and several methods 
have been proposed for shortening the work. Recently a method was developed which 
eliminates the need for computing percentages or expected values, the analysis being made 
with the absolute frequencies. This method depends upon transforming p, the per cent, 
to the inverse sine function of »/p. The method is applicable not only to 2 X 2 tables but 
can also be made applicable to 2 X n tables and r X n tables with the aid of simple formulae. 


Compounding Probabilities from Independent Significance Tests. W. ALLEN 
Wa tis, Stanford University. 


For combining the probabilities obtained from N independent tests of significance into 
a single measure, the product of the N independent probabilities provides a criterion which, 
though rarely ideal, is usually satisfactory. The probability that such a product will be 
less than Q always exceeds Q, and is the sum of the first N terms in a Poisson series whose 
parameter is —log.Q; since this sum is also the probability that a value of x? based on 2N 
degrees of freedom will exceed —2 log.Q, existing tables of x? may (as R. A. Fisher has 
pointed out in Statistical Methods for Research Workers, section 21.1) be used to test the 
significance of a product of probabilities. If any of the probabilities have been derived 
from discontinuous distributions, as is likely with small samples of non-metric data, this 
method of calculating the probablity of the product fails; in such instances it invariably 
overstates the probability of the product. Formulas are given for various special cases 
arising frequently in practice and also for the general case of D + C tests of which D are 








112 ABSTRACTS OF PAPERS 





based on discontinuous distributions and C on continuous distributions. In several il- 
lustrative examples, the overstatement of the joint probability consequent upon neglect 
of discontinuities is of the order of 100 to 200 per cent. 


A Method of Computing the Roots of the General Cubic Equation with Real or 
Complex Coefficients. Ernest E. BLancue, Michigan State College. 


The general cubic equation with real or complex coefficients may readily be reduced to 
the form y? + 3Hy + G = 0. Suitable substitutions for y in the reduced equation permit 
the use of the identities for hyperbolic functions and circular functions: sin 3z, cos 3z, 
sinh 32, cosh 3z and sin (u + iv). The following classifications may be set up: (A) If G < 0 
and H >6, only real root is y = 24/H sinh z where sinh 3z = G/2H4/H = M;; (B-1) If G <0, 
H <0, G/2H+/—H S11, three real roots, obtained by use of circular identity, cos 3z; (B-2) 
If G < 0, H < 0, G/2H+/—H >1, only real root is y = 24/—H cosh z where cosh 3z = 
G/2H./—H. Complex roots are —} y:+ bi. The general cubic with complex coefficients 
has solutions yn,1 = —2/H sin (u + 2nx/3 + iv) for n = 0, 1, 2, where sin (3u + 3iv) = 
a+bi=M. For M real, special cases are similar to (A), (B-1) and (B-2). 


Limited Type of Probability Distribution Applied to Flood Flows (Preliminary 
Report). Braprorp F. Kmpa.t, Port Washington, N. Y. 


Relative to Gumbel’s recent paper on Flood Flows (E. J..Gumbel, ‘‘The return period 
of flood flows,’’ Annals of Mathematical Statistics, Vol. 12 (1941)) the author points out that 
Gumbel’s argument that the probability distribution of maximum values does not stem 
from a limited form of primary probability distribution of the stream flow, is misleading 
(see page 177, loc. cit.). One might argue for a primary probability distribution of stream 
flows of the type: dV = exp(—4u?)du where u = k[b — log(a — z)],0 S z S a, where z is 
the measure of flow. This increment of z is related to normal probability increment by 
the linear equation k dz = (a — x)du. This distribution will not satisfy the condition that 
von Mises uses in his argument concerning a finite distribution, since the cumulative dis- 
tribution V does not possess a positive derivative of finite order at z = a. Also, although z 
does not have infinite range, the transformed variate u has an infinite range to the right, 
and will satisfy von Mises’ argument for the derivation of the cumulative distribution of 
the maxima, of the form exp[— exp{— a(u — wo}] in terms of u. The author finds that 
such a distribution more accurately describes the behavior of maximum annual flood flows 
than one which ignores the existence of an upper limit a. 

















Additive Partition Functions. J. Wo.trowitz, New York City. 


Let mn; and n, be positive integers and let 


nmr N2 
m= max . " 
mtn. m+ Mm 








Let the stochastic variable V = (v:, v2, --- vsy be any sequence of positive integers such that 
v1 + v3 + vs + -:- is equal to either one of n; and nz, while v2 + v4 + vs + --- is equal to 
the other. Two sequences V with the same elements arranged in different order are to be 
considered distinct and all sequences V are to be assigned the same probability. Such 
sequences are of statistical importance (Wald and Wolfowitz, Annals of Math. Stat., Vol. 11 
(1940). Let f(z) be a function defined for all positive integral values of 2 which fulfills 
the following conditions: , 
1. There exists a pair of positive integers, a and b, such that that 


fla) 4 
f(b) * d 
























ABSTRACTS OF PAPERS 


2. The series 


> | F(a) | m¥ 


t=_1 





is convergent. Then, as m, and nz — ©, while n/n; remains constant, the distribution of 
the stochastic variable 
F(V) = >> sf) 
i=l 
approaches the normal distribution. When f(z) = 1, F(V) = U(V) (loc. cit., Theorem I). 
ag 
When f(z) = log (5). F(V) is a statistic introduced by the author (Amer. Math. Soc. Bull. 


a! 
(1941), p. 216). 
A similar result holds for partitions of a single integer. 


II. Presented on December 29, 1941, at the joint session of the Institute, 
The Econometric Society, and Section A of the A. A. A. 8S. 


Certain Tests for Randomness Applied to Data Grouped into Small Sets. 
Epwarp L. Dopp, University of Texas. 


G. Udny Yule, in his paper A Test of Tippett’s Random Sampling Numbers (Roy. Stat. 
Soc. Jour., Vol. 101(1938), pp. 167-172), described tests applied to certain sums of the 
Tippett numbers. Yule regarded the Tippett numbers as not altogether satisfactory. 

The tests now to be described, however, involve no summation. For sets of three 
digits, four classes may be distinguished: The middle number may be the largest, or it 
may be the least; or the sequence may be monotone increasing or monotone decreasing 
—here the sequence a,.a, a, may be classified with the monotone increasing sequences when 
a > 4; otherwise, with the monotone decreasing sequences. Similarly, six consecutive 
digits in two sets of three digits each give rise to sixteen classes. On the basis of range, 
sets of two or more of the digits 0, 1, 2---, 9 may be separated into ten classes. 

Chi-square tests applied by the present author on the basis of the foregoing and similar 
classifications have not thus far indicated that the Tippett numbers are not satisfactorily 
random. 








Stratified Sampling. A. M. Moon, University of Texas. 


When certain relations between the probabilities p:, po, ---, px of a multinomial popula- 
tion are known in advance, the technique of stratified sampling provides more efficient 
estimates of the probabilities than does random sampling. Under certain conditions 
of stratified sampling, however, the maximum likelihood estimates, n;/n, of p; are biased 
but are unbiased in the limit as the sample size increases. The methods and results of the 
theory of maximum likelihood require no modification to be made applicable to the problem 
of estimation in stratified sampling; in fact the results of this theory imply the use of 
stratified sampling when the conditions for its use obtain. 


Advantages of Singling Out Degrees of Freedom in Analyses of Variance. 
Witu1am DoweE.t Baten, Michigan Agriculture Experiment Station. 


This paper pertains to an experiment involving dummy plots for analyzing effects of 
placements and fertilizers for cannery peas. Three fertilizers were used at different dis- 
tances from the pea seeds at planting, the design being a randomized block layout. Ad- 
vantages are given for breaking up the sum of squares, due to differences between ‘‘treat- 
ment”’ means, into sums of squares, each with one degree of freedom. Methods are given 
for securing the sum of squares involving dummy plots, and obtaining the variances due 
to main effects and interaction. Interpretations are given for each phase of the analysis. 





